AWS Bedrock: Building Enterprise AI Applications with Multi-Model Foundation Models

Introduction: Amazon Bedrock is AWS’s fully managed service for building generative AI applications with foundation models. Launched at AWS re:Invent 2023, Bedrock provides a unified API to access models from Anthropic, Meta, Mistral, Cohere, and Amazon’s own Titan family. What sets Bedrock apart is its deep integration with the AWS ecosystem, including built-in RAG with Knowledge Bases, agentic workflows with Bedrock Agents, and enterprise-grade security with Guardrails. This guide covers everything from basic model invocation to building sophisticated AI agents with AWS infrastructure.

AWS Bedrock SDK Architecture
AWS Bedrock: Enterprise Foundation Models with AWS Integration

Capabilities and Features

AWS Bedrock provides comprehensive capabilities for enterprise AI:

  • Multi-Model Access: Claude, Llama, Mistral, Titan, Cohere, and Stability AI models
  • Bedrock Agents: Build autonomous agents with action groups and knowledge bases
  • Knowledge Bases: Managed RAG with automatic chunking, embedding, and retrieval
  • Guardrails: Content filtering, PII detection, and topic blocking
  • Fine-Tuning: Customize models with your own data
  • Provisioned Throughput: Guaranteed capacity for production workloads
  • Model Evaluation: Compare models on your specific use cases
  • Streaming: Real-time response streaming for all models
  • Converse API: Unified chat interface across all models
  • AWS Integration: IAM, CloudWatch, VPC, and PrivateLink support

Getting Started

Set up AWS credentials and install boto3:

# Install AWS SDK
pip install boto3

# Configure AWS credentials
aws configure
# Or set environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

# Enable model access in AWS Console
# Navigate to Bedrock > Model access > Request access

Basic Model Invocation

Invoke foundation models with the Bedrock Runtime:

import boto3
import json

# Initialize Bedrock Runtime client
bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1"
)

# Invoke Claude model
def invoke_claude(prompt: str) -> str:
    response = bedrock_runtime.invoke_model(
        modelId="anthropic.claude-3-sonnet-20240229-v1:0",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1024,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        })
    )
    result = json.loads(response["body"].read())
    return result["content"][0]["text"]

# Invoke Llama model
def invoke_llama(prompt: str) -> str:
    response = bedrock_runtime.invoke_model(
        modelId="meta.llama3-70b-instruct-v1:0",
        body=json.dumps({
            "prompt": f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>",
            "max_gen_len": 1024,
            "temperature": 0.7
        })
    )
    result = json.loads(response["body"].read())
    return result["generation"]

# Invoke Titan model
def invoke_titan(prompt: str) -> str:
    response = bedrock_runtime.invoke_model(
        modelId="amazon.titan-text-express-v1",
        body=json.dumps({
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 1024,
                "temperature": 0.7
            }
        })
    )
    result = json.loads(response["body"].read())
    return result["results"][0]["outputText"]

# Example usage
response = invoke_claude("Explain microservices architecture")
print(response)

Converse API: Unified Chat Interface

Use the Converse API for consistent chat across all models:

import boto3

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

def chat_with_model(model_id: str, messages: list, system_prompt: str = None) -> str:
    """Unified chat interface for any Bedrock model."""
    
    kwargs = {
        "modelId": model_id,
        "messages": messages,
        "inferenceConfig": {
            "maxTokens": 1024,
            "temperature": 0.7
        }
    }
    
    if system_prompt:
        kwargs["system"] = [{"text": system_prompt}]
    
    response = bedrock_runtime.converse(**kwargs)
    return response["output"]["message"]["content"][0]["text"]

# Works with any model
models = [
    "anthropic.claude-3-sonnet-20240229-v1:0",
    "meta.llama3-70b-instruct-v1:0",
    "mistral.mistral-large-2402-v1:0"
]

messages = [
    {"role": "user", "content": [{"text": "What is serverless computing?"}]}
]

for model in models:
    print(f"\n{model}:")
    response = chat_with_model(model, messages)
    print(response[:200] + "...")

# Multi-turn conversation
conversation = []

def chat(user_message: str, model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"):
    conversation.append({
        "role": "user",
        "content": [{"text": user_message}]
    })
    
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=conversation,
        system=[{"text": "You are a helpful AWS solutions architect."}]
    )
    
    assistant_message = response["output"]["message"]
    conversation.append(assistant_message)
    
    return assistant_message["content"][0]["text"]

print(chat("What's the best way to deploy a web application on AWS?"))
print(chat("How would I add a database to that architecture?"))

Knowledge Bases for RAG

Build RAG applications with Bedrock Knowledge Bases:

import boto3

# Initialize clients
bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

# Create Knowledge Base (via Terraform or Console is recommended)
# This example shows the query interface

def query_knowledge_base(
    knowledge_base_id: str,
    query: str,
    model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"
) -> dict:
    """Query a knowledge base with RAG."""
    
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={"text": query},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": knowledge_base_id,
                "modelArn": f"arn:aws:bedrock:us-east-1::foundation-model/{model_id}",
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {
                        "numberOfResults": 5
                    }
                }
            }
        }
    )
    
    return {
        "answer": response["output"]["text"],
        "citations": [
            {
                "text": citation["retrievedReferences"][0]["content"]["text"],
                "source": citation["retrievedReferences"][0]["location"]
            }
            for citation in response.get("citations", [])
            if citation.get("retrievedReferences")
        ]
    }

# Query the knowledge base
result = query_knowledge_base(
    knowledge_base_id="YOUR_KB_ID",
    query="How do I configure auto-scaling for ECS?"
)

print(f"Answer: {result['answer']}")
for citation in result['citations']:
    print(f"Source: {citation['source']}")

Bedrock Agents

Build autonomous agents with action groups:

import boto3
import json

bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

def invoke_agent(
    agent_id: str,
    agent_alias_id: str,
    session_id: str,
    prompt: str
) -> str:
    """Invoke a Bedrock Agent."""
    
    response = bedrock_agent_runtime.invoke_agent(
        agentId=agent_id,
        agentAliasId=agent_alias_id,
        sessionId=session_id,
        inputText=prompt
    )
    
    # Process streaming response
    completion = ""
    for event in response["completion"]:
        if "chunk" in event:
            completion += event["chunk"]["bytes"].decode()
    
    return completion

# Example: Invoke an agent
response = invoke_agent(
    agent_id="YOUR_AGENT_ID",
    agent_alias_id="YOUR_ALIAS_ID",
    session_id="session-123",
    prompt="Check the status of order #12345 and update the customer"
)
print(response)

# Lambda function for Action Group
def lambda_handler(event, context):
    """Handle agent action group invocations."""
    
    action_group = event["actionGroup"]
    api_path = event["apiPath"]
    parameters = event.get("parameters", [])
    
    # Route to appropriate handler
    if api_path == "/orders/{orderId}":
        order_id = next(p["value"] for p in parameters if p["name"] == "orderId")
        result = get_order_status(order_id)
    elif api_path == "/customers/{customerId}/notify":
        customer_id = next(p["value"] for p in parameters if p["name"] == "customerId")
        message = event.get("requestBody", {}).get("content", {}).get("message")
        result = notify_customer(customer_id, message)
    else:
        result = {"error": "Unknown action"}
    
    return {
        "messageVersion": "1.0",
        "response": {
            "actionGroup": action_group,
            "apiPath": api_path,
            "httpMethod": event["httpMethod"],
            "httpStatusCode": 200,
            "responseBody": {
                "application/json": {
                    "body": json.dumps(result)
                }
            }
        }
    }

Guardrails for Safety

import boto3

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

def invoke_with_guardrails(
    prompt: str,
    guardrail_id: str,
    guardrail_version: str = "DRAFT"
) -> dict:
    """Invoke model with guardrails applied."""
    
    response = bedrock_runtime.converse(
        modelId="anthropic.claude-3-sonnet-20240229-v1:0",
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        guardrailConfig={
            "guardrailIdentifier": guardrail_id,
            "guardrailVersion": guardrail_version
        }
    )
    
    # Check if guardrail intervened
    if response.get("stopReason") == "guardrail_intervened":
        return {
            "blocked": True,
            "reason": response.get("trace", {}).get("guardrail", {})
        }
    
    return {
        "blocked": False,
        "response": response["output"]["message"]["content"][0]["text"]
    }

# Example with guardrails
result = invoke_with_guardrails(
    prompt="Tell me about AWS security best practices",
    guardrail_id="YOUR_GUARDRAIL_ID"
)

if result["blocked"]:
    print(f"Request blocked: {result['reason']}")
else:
    print(result["response"])

Streaming Responses

import boto3

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

def stream_response(prompt: str, model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0"):
    """Stream responses from Bedrock models."""
    
    response = bedrock_runtime.converse_stream(
        modelId=model_id,
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        inferenceConfig={"maxTokens": 1024}
    )
    
    for event in response["stream"]:
        if "contentBlockDelta" in event:
            text = event["contentBlockDelta"]["delta"].get("text", "")
            print(text, end="", flush=True)
        elif "messageStop" in event:
            print("\n--- Stream complete ---")

# Stream a response
stream_response("Write a detailed explanation of AWS Lambda")

Benchmarks and Performance

AWS Bedrock pricing and performance:

ModelInput CostOutput CostLatency (p50)
Claude 3 Sonnet$3/M tokens$15/M tokens~2s
Claude 3 Haiku$0.25/M tokens$1.25/M tokens~0.5s
Llama 3 70B$2.65/M tokens$3.50/M tokens~1.5s
Mistral Large$4/M tokens$12/M tokens~1.8s
Titan Text Express$0.20/M tokens$0.60/M tokens~0.8s
Knowledge Base Query$0.035/query+ model costs~3s

When to Use AWS Bedrock

Best suited for:

  • Enterprise applications requiring AWS ecosystem integration
  • Multi-model strategies with unified API access
  • Applications needing managed RAG (Knowledge Bases)
  • Compliance-sensitive workloads (HIPAA, SOC2, FedRAMP)
  • Building autonomous agents with AWS infrastructure
  • Teams with existing AWS investments

Consider alternatives when:

  • Need direct API access to latest model versions (use provider APIs)
  • Building with Azure ecosystem (use Azure OpenAI)
  • Require fine-grained control over RAG pipeline (use LangChain)
  • Cost-sensitive prototyping (direct APIs may be cheaper)

References and Documentation

Conclusion

AWS Bedrock provides a compelling platform for enterprises building generative AI applications within the AWS ecosystem. Its unified API across multiple foundation models, combined with managed services like Knowledge Bases and Agents, significantly reduces the complexity of building production AI systems. The deep integration with AWS security, networking, and monitoring services makes it particularly attractive for regulated industries. While direct provider APIs may offer faster access to cutting-edge features, Bedrock’s enterprise-grade infrastructure and multi-model flexibility make it an excellent choice for organizations committed to AWS.


Discover more from Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.