AWS Certified AI Practitioner(15) - LLM Text Generation & Prompt Optimizatio
๐ค LLM Text Generation & Prompt Optimization
1. How Text is Generated in an LLM
When a model generates text, it predicts the next word based on probabilities.
Example:
โAfter the rain, the streets wereโฆโ
Possible next words and probabilities:
- wet (0.40)
- flooded (0.25)
- slippery (0.15)
- empty (0.10)
- muddy (0.05)
- clean (0.03)
- blocked (0.02)
The model randomly selects a word according to these probabilities.
2. Prompt Performance Optimization
๐น System Prompts
- Define how the model should behave and reply.
- Example: โYou are a teacher in AWS Cloud.โ
๐น Temperature (0 to 1)
- Controls creativity.
- Low (e.g., 0.2): Conservative, repetitive, focused on likely answers.
- High (e.g., 1.0): More diverse, creative, unpredictable, less coherent.
๐น Top P (Nucleus Sampling)
- Value: 0โ1.
- Low (e.g., 0.25): Only top 25% likely words โ coherent output.
- High (e.g., 0.99): Considers more words โ diverse, creative output.
๐น Top K
- Limits number of candidate words.
- Low (e.g., 10): Considers top 10 words โ focused, coherent output.
- High (e.g., 500): Considers many โ more variety, creativity.
๐น Length
- Maximum output length.
๐น Stop Sequences
- Tokens that signal the model to stop generating.
โ
Exam Tip (AWS AI Practitioner):
Know the definitions of System Prompts, Temperature, Top P, Top K, Length, Stop Sequences and what happens with low vs. high values.
3. Prompt Latency
Latency = how fast the model responds.
Impacted by:
- Model size (larger = slower).
- Model type (e.g., LLaMA vs Claude).
- Input tokens (longer prompt = slower).
- Output tokens (longer generation = slower).
โ ๏ธ Not impacted by: Temperature, Top P, or Top K.
โ Exam Tip: Expect a question about what affects latency and what does not.
4. Practical Example
Prompt: โWrite a short story about a robot learning to cook.โ
- Low Creativity (Temp=0.2, Top P=0.25, Top K=10) โ Safe, repetitive story.
- High Creativity (Temp=1.0, Top P=0.99, Top K=500) โ Imaginative, unique story (e.g., robot making crepes with optical sensors).
5. Key Takeaways
- Temperature = randomness/creativity.
- Top P = probability threshold (percentile of words).
- Top K = number of candidate words.
- System Prompt = role/behavior definition.
- Length/Stop Sequences = control output size and ending.
- Latency = depends on model size, type, and token count (not sampling parameters).
๐ Good to Remember for AWS Exam:
- Be clear about how each parameter influences output.
- Understand latency factors.
- Expect scenario questions like: โWhich parameter ensures more coherent answers?โ or โWhat does not affect latency?โ.
All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.