AWS Certified AI Practitioner(12) - Pricing & Model Improvement
๐ Amazon Bedrock โ Pricing & Model Improvement
1๏ธโฃ Pricing Options
๐น On-Demand (Pay-as-you-go)
- How it works: Pay only for what you use, like an electricity bill.
- Pricing basis
- Text Models โ Input/Output token count
- Embedding Models โ Input token count
- Image Models โ Number of images generated
- Available Models: Base Models only
- โ Pros: Flexible, good for unpredictable workloads
- โ Cons: Can become expensive if used continuously over time
๐น Batch Mode (Bulk processing, up to 50% discount)
- How it works: Group multiple requests together โ results stored as a single file in Amazon S3
- Discount: Up to 50% cheaper
- โ Pros: Great for large-scale processing, strong cost savings
- โ Cons: No real-time response, results are delayed
- Best use case: Large batch jobs where immediate results are not required
๐น Provisioned Throughput (Reserved capacity, guaranteed performance)
- How it works: Like a gym membership โ reserve processing capacity for a set period (e.g., 1โ6 months)
- Guaranteed performance: Ensures a maximum number of input/output tokens per minute
- Available Models: Base, Fine-tuned, and Custom Models
- โ Pros: Stable performance and capacity, supports custom models
- โ Cons: Not a cost-saving option, purpose is performance guarantee
๐ Pricing Options Comparison Table
Option | Billing Method | Pricing Basis | Available Models | Pros | Cons | Best Use Case |
---|---|---|---|---|---|---|
On-Demand | Pay-as-you-go | - Text: Input/Output tokens - Embedding: Input tokens - Image: Generated images |
Base Models only | High flexibility Great for unpredictable workloads |
Expensive for long-term use | Occasional use / Unpredictable demand |
Batch Mode | Bulk processing | Results stored in Amazon S3 | Base Models only | Up to 50% discount Efficient for large-scale jobs |
No real-time response Delayed results |
Large requests / No need for instant results |
Provisioned Throughput | Reserved capacity (1โ6 months) | Guaranteed tokens per minute | Base, Fine-tuned, Custom Models | Guaranteed stable performance Supports custom models |
Almost no cost savings | When using custom models / Need guaranteed performance |
2๏ธโฃ Model Improvement Techniques (Low โ High Cost)
1. Prompt Engineering
- Improve results simply by optimizing prompts
- No extra computation โ Lowest cost
2. Retrieval Augmented Generation (RAG)
- Uses an external knowledge database (Vector DB)
- No model retraining โ relatively low cost
- Additional cost for building and maintaining the database
RAG = โModel + Search functionโ โ lets the model find external knowledge it doesnโt already know.
3. Instruction-based Fine-tuning
- Fine-tune the model with labeled data and specific instructions
- Requires extra computation โ Higher cost
4. Domain Adaptation Fine-tuning
- Retrain the model with a large domain-specific dataset
- Requires extensive data preparation + heavy computation โ Highest cost
3๏ธโฃ Cost Optimization Tips
- Token management โ main driver of cost savings
- Keep prompts concise
- Limit output length to whatโs necessary
- Use Batch Mode โ up to 50% cheaper
- Choose smaller models โ generally cheaper
- Adjust hyperparameters (Temperature, Top-K, Top-P)
- Affects model behavior but not pricing
๐ Final Summary (Exam/Practical Points)
- On-Demand = Flexibility / Batch = Bulk & Discounts / Provisioned = Guaranteed Performance
- Cost order: Prompt Engineering < RAG < Instruction Fine-tuning < Domain Adaptation
- Cost-saving keys: Token management + Batch Mode
All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.