๐Ÿค– LLM Text Generation & Prompt Optimization

1. How Text is Generated in an LLM

When a model generates text, it predicts the next word based on probabilities.

Example:
โ€œAfter the rain, the streets wereโ€ฆโ€
Possible next words and probabilities:

  • wet (0.40)
  • flooded (0.25)
  • slippery (0.15)
  • empty (0.10)
  • muddy (0.05)
  • clean (0.03)
  • blocked (0.02)

The model randomly selects a word according to these probabilities.


2. Prompt Performance Optimization

๐Ÿ”น System Prompts

  • Define how the model should behave and reply.
  • Example: โ€œYou are a teacher in AWS Cloud.โ€

๐Ÿ”น Temperature (0 to 1)

  • Controls creativity.
  • Low (e.g., 0.2): Conservative, repetitive, focused on likely answers.
  • High (e.g., 1.0): More diverse, creative, unpredictable, less coherent.

๐Ÿ”น Top P (Nucleus Sampling)

  • Value: 0โ€“1.
  • Low (e.g., 0.25): Only top 25% likely words โ†’ coherent output.
  • High (e.g., 0.99): Considers more words โ†’ diverse, creative output.

๐Ÿ”น Top K

  • Limits number of candidate words.
  • Low (e.g., 10): Considers top 10 words โ†’ focused, coherent output.
  • High (e.g., 500): Considers many โ†’ more variety, creativity.

๐Ÿ”น Length

  • Maximum output length.

๐Ÿ”น Stop Sequences

  • Tokens that signal the model to stop generating.

โœ… Exam Tip (AWS AI Practitioner):
Know the definitions of System Prompts, Temperature, Top P, Top K, Length, Stop Sequences and what happens with low vs. high values.


3. Prompt Latency

Latency = how fast the model responds.
Impacted by:

  • Model size (larger = slower).
  • Model type (e.g., LLaMA vs Claude).
  • Input tokens (longer prompt = slower).
  • Output tokens (longer generation = slower).

โš ๏ธ Not impacted by: Temperature, Top P, or Top K.

โœ… Exam Tip: Expect a question about what affects latency and what does not.


4. Practical Example

Prompt: โ€œWrite a short story about a robot learning to cook.โ€

  • Low Creativity (Temp=0.2, Top P=0.25, Top K=10) โ†’ Safe, repetitive story.
  • High Creativity (Temp=1.0, Top P=0.99, Top K=500) โ†’ Imaginative, unique story (e.g., robot making crepes with optical sensors).


5. Key Takeaways

  • Temperature = randomness/creativity.
  • Top P = probability threshold (percentile of words).
  • Top K = number of candidate words.
  • System Prompt = role/behavior definition.
  • Length/Stop Sequences = control output size and ending.
  • Latency = depends on model size, type, and token count (not sampling parameters).

๐Ÿ“˜ Good to Remember for AWS Exam:

  • Be clear about how each parameter influences output.
  • Understand latency factors.
  • Expect scenario questions like: โ€œWhich parameter ensures more coherent answers?โ€ or โ€œWhat does not affect latency?โ€.