๐Ÿ“˜ Amazon Bedrock โ€“ Pricing & Model Improvement

1๏ธโƒฃ Pricing Options

๐Ÿ”น On-Demand (Pay-as-you-go)

  • How it works: Pay only for what you use, like an electricity bill.
  • Pricing basis
    • Text Models โ†’ Input/Output token count
    • Embedding Models โ†’ Input token count
    • Image Models โ†’ Number of images generated
  • Available Models: Base Models only
  • โœ… Pros: Flexible, good for unpredictable workloads
  • โŒ Cons: Can become expensive if used continuously over time

๐Ÿ”น Batch Mode (Bulk processing, up to 50% discount)

  • How it works: Group multiple requests together โ†’ results stored as a single file in Amazon S3
  • Discount: Up to 50% cheaper
  • โœ… Pros: Great for large-scale processing, strong cost savings
  • โŒ Cons: No real-time response, results are delayed
  • Best use case: Large batch jobs where immediate results are not required

๐Ÿ”น Provisioned Throughput (Reserved capacity, guaranteed performance)

  • How it works: Like a gym membership โ€” reserve processing capacity for a set period (e.g., 1โ€“6 months)
  • Guaranteed performance: Ensures a maximum number of input/output tokens per minute
  • Available Models: Base, Fine-tuned, and Custom Models
  • โœ… Pros: Stable performance and capacity, supports custom models
  • โŒ Cons: Not a cost-saving option, purpose is performance guarantee

๐Ÿ“Š Pricing Options Comparison Table

Option Billing Method Pricing Basis Available Models Pros Cons Best Use Case
On-Demand Pay-as-you-go - Text: Input/Output tokens
- Embedding: Input tokens
- Image: Generated images
Base Models only High flexibility
Great for unpredictable workloads
Expensive for long-term use Occasional use / Unpredictable demand
Batch Mode Bulk processing Results stored in Amazon S3 Base Models only Up to 50% discount
Efficient for large-scale jobs
No real-time response
Delayed results
Large requests / No need for instant results
Provisioned Throughput Reserved capacity (1โ€“6 months) Guaranteed tokens per minute Base, Fine-tuned, Custom Models Guaranteed stable performance
Supports custom models
Almost no cost savings When using custom models / Need guaranteed performance

2๏ธโƒฃ Model Improvement Techniques (Low โ†’ High Cost)

1. Prompt Engineering

  • Improve results simply by optimizing prompts
  • No extra computation โ†’ Lowest cost

2. Retrieval Augmented Generation (RAG)

  • Uses an external knowledge database (Vector DB)
  • No model retraining โ†’ relatively low cost
  • Additional cost for building and maintaining the database

RAG = โ€œModel + Search functionโ€ โ†’ lets the model find external knowledge it doesnโ€™t already know.

3. Instruction-based Fine-tuning

  • Fine-tune the model with labeled data and specific instructions
  • Requires extra computation โ†’ Higher cost

4. Domain Adaptation Fine-tuning

  • Retrain the model with a large domain-specific dataset
  • Requires extensive data preparation + heavy computation โ†’ Highest cost

3๏ธโƒฃ Cost Optimization Tips

  • Token management โ†’ main driver of cost savings
    • Keep prompts concise
    • Limit output length to whatโ€™s necessary
  • Use Batch Mode โ†’ up to 50% cheaper
  • Choose smaller models โ†’ generally cheaper
  • Adjust hyperparameters (Temperature, Top-K, Top-P)
    • Affects model behavior but not pricing

๐Ÿ“ Final Summary (Exam/Practical Points)

  • On-Demand = Flexibility / Batch = Bulk & Discounts / Provisioned = Guaranteed Performance
  • Cost order: Prompt Engineering < RAG < Instruction Fine-tuning < Domain Adaptation
  • Cost-saving keys: Token management + Batch Mode