πŸ“š Amazon Bedrock – Setting up RAG & Knowledge Base (Hands-on)

This guide explains how to set up a Retrieval-Augmented Generation (RAG) pipeline and a Knowledge Base in Amazon Bedrock, using Amazon S3 for storage and Amazon OpenSearch Serverless as the vector database.


1. πŸ” Prerequisites

  • IAM User (not root user)
  • Administrator Access policy for the IAM user
  • AWS services:
    • Amazon Bedrock
    • Amazon S3
    • Amazon OpenSearch Serverless (or external vector DB)
  • PDF or text document to upload (e.g., evolution_of_the_internet_detailed.pdf)

2. πŸ›  Step-by-Step Setup

Step 1 – Create an IAM User

  1. Go to IAM Console β†’ Users β†’ Create User.

  2. Enter a username (e.g., stephane).

  3. Enable AWS Management Console Access.

  4. Set a custom password.

  5. Attach the AdministratorAccess policy.

  6. Save the sign-in URL, username, and password.

  7. Log in as the IAM user (not root).


Step 2 – Create a Knowledge Base in Amazon Bedrock

  1. In Amazon Bedrock, go to Knowledge Bases β†’ Create Knowledge Base.
  2. Set the name (default is fine).

  1. IAM permissions β†’ Create and use a new service role.
  2. Data Source β†’ Select Amazon S3.
  3. Alternative sources (optional):
    • Web crawler (webpages)
    • Confluence
    • Salesforce
    • SharePoint


Step 3 – Create an Amazon S3 Bucket & Upload Documents

  1. Go to Amazon S3 β†’ Create bucket.
    • Region: us-east-1
    • Bucket name: must be globally unique (e.g., my-demo-bucket-knowledgebase-danny)
  2. Upload your document:
    • Example: evolution_of_the_internet_detailed.pdf
  3. Confirm the object appears in the bucket.


Step 4 – Connect S3 to Bedrock Knowledge Base

  1. In Bedrock KB creation:
    • Select your S3 bucket as the data source.
    • Click Next.

  1. Embedding Model:
    • Select Amazon Titan Text Embeddings V2 (default dimensions).
  2. Vector Database:
    • For AWS exam β†’ Amazon OpenSearch Serverless is the common choice.
    • External free option β†’ Pinecone (free tier available).
  3. Complete the KB creation.

⚠️ Cost Warning:
Amazon OpenSearch Serverless minimum cost is ~$172/month (2 OCUs at $0.24/hour).
Delete resources after use to avoid charges.


Step 5 – Sync Data to Vector Database

  1. Open your Knowledge Base.
  2. Click Sync to push S3 data β†’ embeddings β†’ vector database.
  3. In OpenSearch Service:
    • View your collection and indexes.
    • Each chunk of your document is stored as a vector.


Step 6 – Test the Knowledge Base

  1. Configure a model (e.g., Anthropic Claude Haiku).
  2. Ask a question (e.g., "Who invented the World Wide Web?").
  3. Bedrock will:
    • Perform vector similarity search (KNN search).
    • Retrieve relevant chunks from the KB.
    • Augment the prompt with retrieved text.
    • Generate an answer with source citations.
  4. Click the source link β†’ View the PDF in S3.


3. 🧠 How It Works Internally

πŸ“ˆ RAG Data Flow Diagram

1
2
3
4
5
6
7
flowchart TD
A[πŸ“‚ Amazon S3 PDF] --> B[βœ‚οΈ Chunking & Embedding Creation<br/>(Amazon Titan)]
B --> C[πŸ—„ Vector Database<br/>OpenSearch Serverless]
C --> D[πŸ” KNN Similarity Search]
D --> E[πŸ“‘ Relevant Chunks Retrieved]
E --> F[πŸ“ Combined with Original Query<br/>β†’ Augmented Prompt]
F --> G[πŸ€– Foundation Model Generates Answer]
  • Chunking: Splits the document into smaller parts.
  • Embeddings: Numeric vector representation of text.
  • KNN Search: Finds the k most semantically similar chunks.
  • Augmented Prompt: Original query + retrieved text β†’ better answer.

4. πŸ›‘ Cleanup (Avoid Unnecessary Costs)

After testing:

  1. Delete Knowledge Base in Bedrock.
  2. Delete OpenSearch Serverless collection.
  3. (Optional) Keep S3 bucket (low cost) or delete it.


5. πŸ“Œ Exam Tips

  • Always use IAM user (root user cannot create Bedrock KB).
  • Vector DB Options in AWS:
    • OpenSearch (real-time search, KNN)
    • Aurora PostgreSQL (pgvector)
    • Neptune Analytics (graph-based RAG)
    • S3 Vectors (low cost, sub-second search)
  • External: Pinecone, Redis, MongoDB Atlas Vector Search.
  • Bedrock KB supports multiple data sources, not just S3.
  • Remember: RAG = Retrieve external data + Augment prompt + Generate answer.

βœ… Summary:
You’ve created a Bedrock Knowledge Base with Amazon S3 + OpenSearch, generated embeddings with Titan, performed KNN search, and tested retrieval-augmented responses with citations.