Introduction: Amazon Bedrock is AWS’s fully managed service for building generative AI applications with foundation models. Launched at AWS re:Invent 2023, Bedrock provides a unified API to access models from Anthropic, Meta, Mistral, Cohere, and Amazon’s own Titan family. What sets Bedrock apart is its deep integration with the AWS ecosystem, including built-in RAG with Knowledge Bases, agentic workflows with Bedrock Agents, and enterprise-grade security with Guardrails. This guide covers everything from basic model invocation to building sophisticated AI agents with AWS infrastructure.

Capabilities and Features
AWS Bedrock provides comprehensive capabilities for enterprise AI:
- Multi-Model Access: Claude, Llama, Mistral, Titan, Cohere, and Stability AI models
- Bedrock Agents: Build autonomous agents with action groups and knowledge bases
- Knowledge Bases: Managed RAG with automatic chunking, embedding, and retrieval
- Guardrails: Content filtering, PII detection, and topic blocking
- Fine-Tuning: Customize models with your own data
- Provisioned Throughput: Guaranteed capacity for production workloads
- Model Evaluation: Compare models on your specific use cases
- Streaming: Real-time response streaming for all models
- Converse API: Unified chat interface across all models
- AWS Integration: IAM, CloudWatch, VPC, and PrivateLink support
Getting Started
Set up AWS credentials and install boto3:
# Install AWS SDK
pip install boto3
# Configure AWS credentials
aws configure
# Or set environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
# Enable model access in AWS Console
# Navigate to Bedrock > Model access > Request access
Basic Model Invocation
Invoke foundation models with the Bedrock Runtime:
import boto3
import json
# Initialize Bedrock Runtime client
bedrock_runtime = boto3.client(
service_name="bedrock-runtime",
region_name="us-east-1"
)
# Invoke Claude model
def invoke_claude(prompt: str) -> str:
response = bedrock_runtime.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": prompt}
]
})
)
result = json.loads(response["body"].read())
return result["content"][0]["text"]
# Invoke Llama model
def invoke_llama(prompt: str) -> str:
response = bedrock_runtime.invoke_model(
modelId="meta.llama3-70b-instruct-v1:0",
body=json.dumps({
"prompt": f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>",
"max_gen_len": 1024,
"temperature": 0.7
})
)
result = json.loads(response["body"].read())
return result["generation"]
# Invoke Titan model
def invoke_titan(prompt: str) -> str:
response = bedrock_runtime.invoke_model(
modelId="amazon.titan-text-express-v1",
body=json.dumps({
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": 1024,
"temperature": 0.7
}
})
)
result = json.loads(response["body"].read())
return result["results"][0]["outputText"]
# Example usage
response = invoke_claude("Explain microservices architecture")
print(response)
Converse API: Unified Chat Interface
Use the Converse API for consistent chat across all models:
import boto3
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
def chat_with_model(model_id: str, messages: list, system_prompt: str = None) -> str:
"""Unified chat interface for any Bedrock model."""
kwargs = {
"modelId": model_id,
"messages": messages,
"inferenceConfig": {
"maxTokens": 1024,
"temperature": 0.7
}
}
if system_prompt:
kwargs["system"] = [{"text": system_prompt}]
response = bedrock_runtime.converse(**kwargs)
return response["output"]["message"]["content"][0]["text"]
# Works with any model
models = [
"anthropic.claude-3-sonnet-20240229-v1:0",
"meta.llama3-70b-instruct-v1:0",
"mistral.mistral-large-2402-v1:0"
]
messages = [
{"role": "user", "content": [{"text": "What is serverless computing?"}]}
]
for model in models:
print(f"\n{model}:")
response = chat_with_model(model, messages)
print(response[:200] + "...")
# Multi-turn conversation
conversation = []
def chat(user_message: str, model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"):
conversation.append({
"role": "user",
"content": [{"text": user_message}]
})
response = bedrock_runtime.converse(
modelId=model_id,
messages=conversation,
system=[{"text": "You are a helpful AWS solutions architect."}]
)
assistant_message = response["output"]["message"]
conversation.append(assistant_message)
return assistant_message["content"][0]["text"]
print(chat("What's the best way to deploy a web application on AWS?"))
print(chat("How would I add a database to that architecture?"))
Knowledge Bases for RAG
Build RAG applications with Bedrock Knowledge Bases:
import boto3
# Initialize clients
bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
# Create Knowledge Base (via Terraform or Console is recommended)
# This example shows the query interface
def query_knowledge_base(
knowledge_base_id: str,
query: str,
model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"
) -> dict:
"""Query a knowledge base with RAG."""
response = bedrock_agent_runtime.retrieve_and_generate(
input={"text": query},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": knowledge_base_id,
"modelArn": f"arn:aws:bedrock:us-east-1::foundation-model/{model_id}",
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"numberOfResults": 5
}
}
}
}
)
return {
"answer": response["output"]["text"],
"citations": [
{
"text": citation["retrievedReferences"][0]["content"]["text"],
"source": citation["retrievedReferences"][0]["location"]
}
for citation in response.get("citations", [])
if citation.get("retrievedReferences")
]
}
# Query the knowledge base
result = query_knowledge_base(
knowledge_base_id="YOUR_KB_ID",
query="How do I configure auto-scaling for ECS?"
)
print(f"Answer: {result['answer']}")
for citation in result['citations']:
print(f"Source: {citation['source']}")
Bedrock Agents
Build autonomous agents with action groups:
import boto3
import json
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
def invoke_agent(
agent_id: str,
agent_alias_id: str,
session_id: str,
prompt: str
) -> str:
"""Invoke a Bedrock Agent."""
response = bedrock_agent_runtime.invoke_agent(
agentId=agent_id,
agentAliasId=agent_alias_id,
sessionId=session_id,
inputText=prompt
)
# Process streaming response
completion = ""
for event in response["completion"]:
if "chunk" in event:
completion += event["chunk"]["bytes"].decode()
return completion
# Example: Invoke an agent
response = invoke_agent(
agent_id="YOUR_AGENT_ID",
agent_alias_id="YOUR_ALIAS_ID",
session_id="session-123",
prompt="Check the status of order #12345 and update the customer"
)
print(response)
# Lambda function for Action Group
def lambda_handler(event, context):
"""Handle agent action group invocations."""
action_group = event["actionGroup"]
api_path = event["apiPath"]
parameters = event.get("parameters", [])
# Route to appropriate handler
if api_path == "/orders/{orderId}":
order_id = next(p["value"] for p in parameters if p["name"] == "orderId")
result = get_order_status(order_id)
elif api_path == "/customers/{customerId}/notify":
customer_id = next(p["value"] for p in parameters if p["name"] == "customerId")
message = event.get("requestBody", {}).get("content", {}).get("message")
result = notify_customer(customer_id, message)
else:
result = {"error": "Unknown action"}
return {
"messageVersion": "1.0",
"response": {
"actionGroup": action_group,
"apiPath": api_path,
"httpMethod": event["httpMethod"],
"httpStatusCode": 200,
"responseBody": {
"application/json": {
"body": json.dumps(result)
}
}
}
}
Guardrails for Safety
import boto3
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
def invoke_with_guardrails(
prompt: str,
guardrail_id: str,
guardrail_version: str = "DRAFT"
) -> dict:
"""Invoke model with guardrails applied."""
response = bedrock_runtime.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": [{"text": prompt}]}],
guardrailConfig={
"guardrailIdentifier": guardrail_id,
"guardrailVersion": guardrail_version
}
)
# Check if guardrail intervened
if response.get("stopReason") == "guardrail_intervened":
return {
"blocked": True,
"reason": response.get("trace", {}).get("guardrail", {})
}
return {
"blocked": False,
"response": response["output"]["message"]["content"][0]["text"]
}
# Example with guardrails
result = invoke_with_guardrails(
prompt="Tell me about AWS security best practices",
guardrail_id="YOUR_GUARDRAIL_ID"
)
if result["blocked"]:
print(f"Request blocked: {result['reason']}")
else:
print(result["response"])
Streaming Responses
import boto3
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
def stream_response(prompt: str, model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"):
"""Stream responses from Bedrock models."""
response = bedrock_runtime.converse_stream(
modelId=model_id,
messages=[{"role": "user", "content": [{"text": prompt}]}],
inferenceConfig={"maxTokens": 1024}
)
for event in response["stream"]:
if "contentBlockDelta" in event:
text = event["contentBlockDelta"]["delta"].get("text", "")
print(text, end="", flush=True)
elif "messageStop" in event:
print("\n--- Stream complete ---")
# Stream a response
stream_response("Write a detailed explanation of AWS Lambda")
Benchmarks and Performance
AWS Bedrock pricing and performance:
| Model | Input Cost | Output Cost | Latency (p50) |
|---|---|---|---|
| Claude 3 Sonnet | $3/M tokens | $15/M tokens | ~2s |
| Claude 3 Haiku | $0.25/M tokens | $1.25/M tokens | ~0.5s |
| Llama 3 70B | $2.65/M tokens | $3.50/M tokens | ~1.5s |
| Mistral Large | $4/M tokens | $12/M tokens | ~1.8s |
| Titan Text Express | $0.20/M tokens | $0.60/M tokens | ~0.8s |
| Knowledge Base Query | $0.035/query | + model costs | ~3s |
When to Use AWS Bedrock
Best suited for:
- Enterprise applications requiring AWS ecosystem integration
- Multi-model strategies with unified API access
- Applications needing managed RAG (Knowledge Bases)
- Compliance-sensitive workloads (HIPAA, SOC2, FedRAMP)
- Building autonomous agents with AWS infrastructure
- Teams with existing AWS investments
Consider alternatives when:
- Need direct API access to latest model versions (use provider APIs)
- Building with Azure ecosystem (use Azure OpenAI)
- Require fine-grained control over RAG pipeline (use LangChain)
- Cost-sensitive prototyping (direct APIs may be cheaper)
References and Documentation
- Official Documentation: https://docs.aws.amazon.com/bedrock/
- API Reference: https://docs.aws.amazon.com/bedrock/latest/APIReference/
- Bedrock Workshop: https://catalog.workshops.aws/building-with-amazon-bedrock/
- Boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock.html
- Pricing: https://aws.amazon.com/bedrock/pricing/
Conclusion
AWS Bedrock provides a compelling platform for enterprises building generative AI applications within the AWS ecosystem. Its unified API across multiple foundation models, combined with managed services like Knowledge Bases and Agents, significantly reduces the complexity of building production AI systems. The deep integration with AWS security, networking, and monitoring services makes it particularly attractive for regulated industries. While direct provider APIs may offer faster access to cutting-edge features, Bedrock’s enterprise-grade infrastructure and multi-model flexibility make it an excellent choice for organizations committed to AWS.
Discover more from Code, Cloud & Context
Subscribe to get the latest posts sent to your email.