Production Deployment on Google Cloud – Part 4 of 5


Part 4 of 5 – From Development to Production-Ready
Agents

Series Navigation:

Part
1: Introduction
| Part
2: Tools & Memory
| Part 3:
Multi-Agent Systems
| Part 4: Production Deployment | Part 5: Advanced Patterns (Coming
Soon)

You’ve built sophisticated multi-agent systems. Now it’s
time to deploy them to production. This article provides reference documentation for deploying ADK agents on Google
Cloud, covering deployment options, observability, security, and CI/CD workflows.

We’ll compare Cloud Run, Vertex AI Agent Engine, and GKE
deployments, implement comprehensive monitoring with Cloud Trace and BigQuery, and set up production-grade CI/CD
pipelines.

Deployment Options Overview

Google Cloud offers three primary deployment paths for ADK agents:

Option Best For Complexity Cost Scalability
Cloud Run REST APIs, webhooks, serverless workloads ⭐ Low 💰 Pay per use 0-1000 instances
Vertex AI Agent Engine Complex agents, managed orchestration ⭐⭐ Medium 💰💰 Managed service Fully managed
GKE (Kubernetes) Enterprise, multi-tenant, custom networking ⭐⭐⭐ High 💰💰💰 Reserved resources Unlimited
🏗️ Decision Matrix:

  • Choose Cloud Run for REST APIs, low-medium traffic (<10K req/day), simple architectures
  • Choose Vertex AI Agent Engine for complex multi-agent systems, managed orchestration,
    native Gemini features
  • Choose GKE for enterprise scale (>100K req/day), custom networking, multi-tenancy,
    regulatory requirements

Option 1: Cloud Run Deployment

Cloud Run is Google’s serverless container platform. It’s ideal for stateless agents serving REST
APIs.

Step 1: Dockerize Your Agent

# Dockerfile for ADK Agent
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY agents/ ./agents/
COPY tools/ ./tools/
COPY config/ ./config/
COPY main.py .

# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PORT=8080

# Run the application
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
# requirements.txt
google-adk-core==1.0.0
google-adk-vertexai==1.0.0
google-cloud-trace==1.11.0
google-cloud-logging==3.8.0
flask==3.0.0
gunicorn==21.2.0
asyncio==3.4.3

Step 2: Create Flask API Wrapper

"""
main.py
Flask API wrapper for ADK agent deployment on Cloud Run
"""

from flask import Flask, request, jsonify
from google.cloud import logging as cloud_logging
import asyncio
from agents.search_agent import SearchAssistant
import os

# Initialize Cloud Logging
logging_client = cloud_logging.Client()
logging_client.setup_logging()

app = Flask(__name__)

# Initialize agent (singleton)
agent = None

def get_agent():
    """Lazy initialization of agent."""
    global agent
    if agent is None:
        config_path = os.getenv('AGENT_CONFIG', 'config/production.yaml')
        agent = SearchAssistant(config_path=config_path)
    return agent

@app.route('/health', methods=['GET'])
def health_check():
    """Health check endpoint for Cloud Run."""
    return jsonify({
        'status': 'healthy',
        'service': 'adk-agent',
        'version': '1.0.0'
    }), 200

@app.route('/agent/query', methods=['POST'])
async def agent_query():
    """
    Main agent endpoint.
    
    Request:
        {
            "query": "What is Google ADK?",
            "context": {"user_id": "123"},
            "session_id": "abc-123"
        }
    
    Response:
        {
            "answer": "...",
            "metadata": {...}
        }
    """
    try:
        data = request.get_json()
        
        if not data or 'query' not in data:
            return jsonify({'error': 'Missing query parameter'}), 400
        
        query = data['query']
        context = data.get('context', {})
        session_id = data.get('session_id')
        
        # Get agent instance
        agent_instance = get_agent()
        
        # Process query (async)
        answer = await agent_instance.ask(
            question=query,
            context=context
        )
        
        return jsonify({
            'answer': answer,
            'session_id': session_id,
            'metadata': {
                'model': 'gemini-1.5-pro-002',
                'timestamp': datetime.utcnow().isoformat()
            }
        }), 200
        
    except Exception as e:
        app.logger.error(f"Agent query error: {str(e)}")
        return jsonify({'error': 'Internal server error'}), 500

@app.route('/agent/batch', methods=['POST'])
async def agent_batch():
    """
    Batch processing endpoint.
    
    Request:
        {
            "queries": ["query1", "query2", "query3"]
        }
    
    Response:
        {
            "results": [
                {"query": "query1", "answer": "..."},
                ...
            ]
        }
    """
    try:
        data = request.get_json()
        queries = data.get('queries', [])
        
        if not queries:
            return jsonify({'error': 'No queries provided'}), 400
        
        agent_instance = get_agent()
        
        # Process in parallel
        answers = await agent_instance.ask_batch(queries)
        
        results = [
            {'query': q, 'answer': a}
            for q, a in zip(queries, answers)
        ]
        
        return jsonify({'results': results}), 200
        
    except Exception as e:
        app.logger.error(f"Batch processing error: {str(e)}")
        return jsonify({'error': 'Internal server error'}), 500

if __name__ == '__main__':
    port = int(os.environ.get('PORT', 8080))
    app.run(host='0.0.0.0', port=port)

Step 3: Deploy to Cloud Run

# Build and push container
gcloud builds submit --tag gcr.io/PROJECT_ID/adk-agent

# Deploy to Cloud Run
gcloud run deploy adk-agent \
  --image gcr.io/PROJECT_ID/adk-agent \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars "GOOGLE_CLOUD_PROJECT=PROJECT_ID" \
  --memory 2Gi \
  --cpu 2 \
  --timeout 300 \
  --concurrency 80 \
  --min-instances 0 \
  --max-instances 100

# Get service URL
gcloud run services describe adk-agent \
  --region us-central1 \
  --format 'value(status.url)'
Cloud Run Best Practices:

  1. Concurrency: Set to 80-100 for CPU-intensive agents, 200-300 for I/O-bound
  2. Memory: 2Gi minimum for agents with memory/vector stores
  3. Timeout: 300s max (5 min) – optimize agents to respond faster
  4. Min instances: Set to 1-3 for production to avoid cold starts
  5. CPU Allocation: Use “CPU always allocated” for consistent performance

Reference: Cloud Run Autoscaling

Cloud Run with Cloud Load Balancing

For production traffic management:

# Create backend service
gcloud compute backend-services create adk-backend \
  --global \
  --load-balancing-scheme=EXTERNAL_MANAGED

# Add Cloud Run NEG
gcloud compute network-endpoint-groups create adk-neg \
  --region=us-central1 \
  --network-endpoint-type=SERVERLESS \
  --cloud-run-service=adk-agent

gcloud compute backend-services add-backend adk-backend \
  --global \
  --network-endpoint-group=adk-neg \
  --network-endpoint-group-region=us-central1

# Create URL map and load balancer
gcloud compute url-maps create adk-lb \
  --default-service adk-backend

gcloud compute target-https-proxies create adk-https-proxy \
  --url-map adk-lb

# Reserve IP and create forwarding rule
gcloud compute addresses create adk-ip --global

gcloud compute forwarding-rules create adk-https-rule \
  --global \
  --target-https-proxy=adk-https-proxy \
  --address=adk-ip \
  --ports=443

Option 2: Vertex AI Agent Engine Deployment

Vertex AI Agent Engine provides managed orchestration for complex multi-agent systems.

Define Agent Configuration

# agent_config.yaml for Vertex AI Agent Engine
apiVersion: aiplatform.googleapis.com/v1
kind: Agent
metadata:
  name: research-assistant
  project: my-project
  location: us-central1

spec:
  # Agent definition
  displayName: "Research Assistant Agent"
  description: "Multi-agent research assistant with search and synthesis"
  
  # Model configuration
  model:
    name: "gemini-1.5-pro-002"
    parameters:
      temperature: 0.7
      topP: 0.95
      maxOutputTokens: 2048
  
  # Tools
  tools:
    - name: "google_search"
      type: "GOOGLE_SEARCH"
      config:
        maxResults: 10
        safeSearch: true
    
    - name: "custom_database"
      type: "FUNCTION"
      function:
        name: "query_database"
        description: "Query PostgreSQL database"
        parameters:
          type: "object"
          properties:
            query:
              type: "string"
              description: "SQL SELECT query"
          required: ["query"]
        serviceAccount: "agent-sa@project.iam.gserviceaccount.com"
        endpoint: "https://db-function-url.run.app"
  
  # Memory configuration
  memory:
    conversationBuffer:
      maxMessages: 20
    
    vectorStore:
      enabled: true
      indexEndpoint: "projects/PROJECT/locations/us-central1/indexEndpoints/INDEX_ID"
      embeddingModel: "textembedding-gecko@003"
  
  # Safety settings
  safetySettings:
    - category: "HARM_CATEGORY_HATE_SPEECH"
      threshold: "BLOCK_MEDIUM_AND_ABOVE"
    - category: "HARM_CATEGORY_DANGEROUS_CONTENT"
      threshold: "BLOCK_MEDIUM_AND_ABOVE"
  
  # Observability
  observability:
    cloudTrace:
      enabled: true
    cloudLogging:
      enabled: true
      logLevel: "INFO"
    bigQueryExport:
      enabled: true
      dataset: "agent_analytics"
      table: "agent_requests"

Deploy Agent to Vertex AI

# Deploy agent
gcloud ai agents deploy research-assistant \
  --config agent_config.yaml \
  --region us-central1 \
  --project my-project

# Get agent endpoint
gcloud ai agents describe research-assistant \
  --region us-central1 \
  --format 'value(endpoint)'

# Test agent
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the latest developments in multi-agent systems?",
    "sessionId": "test-session-123"
  }' \
  https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/REGION/agents/research-assistant:query

Reference: Vertex AI
Agent Engine Documentation

Option 3: GKE (Kubernetes) Deployment

For enterprise-scale deployments with custom requirements.

Kubernetes Deployment Manifest

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: adk-agent
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: adk-agent
  template:
    metadata:
      labels:
        app: adk-agent
        version: v1.0.0
    spec:
      serviceAccountName: adk-agent-sa
      
      containers:
      - name: agent
        image: gcr.io/PROJECT/adk-agent:v1.0.0
        ports:
        - containerPort: 8080
          name: http
        
        env:
        - name: GOOGLE_CLOUD_PROJECT
          value: "my-project"
        - name: AGENT_CONFIG
          value: "/config/production.yaml"
        
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        
        volumeMounts:
        - name: config
          mountPath: /config
          readOnly: true
        - name: secrets
          mountPath: /secrets
          readOnly: true
      
      volumes:
      - name: config
        configMap:
          name: agent-config
      - name: secrets
        secret:
          secretName: agent-secrets

---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: adk-agent
  namespace: production
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: adk-agent

---
# hpa.yaml - Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: adk-agent-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: adk-agent
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 30

Deploy to GKE

# Create GKE cluster
gcloud container clusters create adk-cluster \
  --region us-central1 \
  --num-nodes 3 \
  --machine-type n2-standard-4 \
  --enable-autoscaling \
  --min-nodes 3 \
  --max-nodes 20 \
  --enable-stackdriver-kubernetes \
  --enable-cloud-logging \
  --enable-cloud-monitoring \
  --workload-pool=PROJECT.svc.id.goog

# Get credentials
gcloud container clusters get-credentials adk-cluster \
  --region us-central1

# Create namespace
kubectl create namespace production

# Apply manifests
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f hpa.yaml

# Check deployment
kubectl get pods -n production
kubectl get hpa -n production

Observability and Monitoring

Cloud Trace Integration

ADK automatically traces all agent operations when enabled:

"""
Enable Cloud Trace in agent configuration
"""

from google.cloud import trace_v2
from adk import Agent, AgentConfig
from adk.observability import enable_tracing

# Enable tracing
enable_tracing(
    project_id='my-project',
    sampling_rate=1.0,  # 100% sampling for development, 0.1 for production
)

# Agent will automatically trace:
# - LLM calls
# - Tool executions
# - Memory operations
# - Multi-agent communication

agent = Agent(
    config=AgentConfig(
        name="traced-agent",
        observability={
            'cloud_trace': True,
            'sampling_rate': 0.1,  # 10% in production
        }
    )
)

Custom Trace Spans

from adk.observability import trace_span

class CustomAgent:
    @trace_span(name="custom_processing")
    async def process(self, data):
        """Custom processing with automatic tracing."""
        
        # Your logic here
        result = await self.analyze(data)
        
        return result
    
    @trace_span(name="analysis", attributes={"model": "gemini-1.5-pro"})
    async def analyze(self, data):
        """Analysis step with custom attributes."""
        
        # Traced automatically
        response = await self.agent.run(data)
        
        return response

BigQuery Analytics Export

"""
Export agent metrics to BigQuery for analysis
"""

from adk.observability import BigQueryExporter

# Configure BigQuery export
exporter = BigQueryExporter(
    project_id ='my-project',
    dataset_id='agent_analytics',
    table_id='agent_requests',
    schema=[
        {'name': 'timestamp', 'type': 'TIMESTAMP'},
        {'name': 'agent_name', 'type': 'STRING'},
        {'name': 'session_id', 'type': 'STRING'},
        {'name': 'query', 'type': 'STRING'},
        {'name': 'response_time_ms', 'type': 'FLOAT'},
        {'name': 'token_count', 'type': 'INTEGER'},
        {'name': 'tools_used', 'type': 'STRING', 'mode': 'REPEATED'},
        {'name': 'success', 'type': 'BOOLEAN'},
        {'name': 'error_message', 'type': 'STRING'},
    ]
)

# Attach to agent
agent = Agent(
    config=agent_config,
    observability_exporter=exporter,
)

# Metrics are automatically exported
# Query in BigQuery:
"""
SELECT 
  agent_name,
  AVG(response_time_ms) as avg_response_time,
  COUNT(*) as total_requests,
  COUNTIF(success) as successful_requests,
  ARRAY_AGG(DISTINCT tool IGNORE NULLS) as tools_used
FROM `project.agent_analytics.agent_requests`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY agent_name
"""

Cloud Monitoring Dashboards

# Create monitoring dashboard
cat > dashboard.json <<EOF
{
  "displayName": "ADK Agent Monitoring",
  "dashboardFilters": [],
  "gridLayout": {
    "widgets": [
      {
        "title": "Request Rate",
        "xyChart": {
          "dataSets": [{
            "timeSeriesQuery": {
              "timeSeriesFilter": {
                "filter": "resource.type=\"cloud_run_revision\" resource.labels.service_name=\"adk-agent\"",
                "aggregation": {
                  "alignmentPeriod": "60s",
                  "perSeriesAligner": "ALIGN_RATE"
                }
              }
            }
          }]
        }
      },
      {
        "title": "Response Latency (p50, p95, p99)",
        "xyChart": {
          "dataSets": [{
            "timeSeriesQuery": {
              "timeSeriesFilter": {
                "filter": "resource.type=\"cloud_run_revision\" metric.type=\"run.googleapis.com/request_latencies\"",
                "aggregation": {
                  "alignmentPeriod": "60s",
                  "crossSeriesReducer": "REDUCE_PERCENTILE_50"
                }
              }
            }
          }]
        }
      }
    ]
  }
}
EOF

gcloud monitoring dashboards create --config-from-file=dashboard.json

Monitoring Reference: Cloud Monitoring
Documentation

Security Best Practices

Service Account Setup

# Create service account
gcloud iam service-accounts create adk-agent \
  --display-name="ADK Agent Service Account"

# Grant minimum required permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/cloudtrace.agent"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/logging.logWriter"

# For secret access
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

Secret Manager Integration

# Store API keys in Secret Manager
echo -n "my-api-key" | gcloud secrets create database-password \
  --data-file=- \
  --replication-policy="automatic"

# Grant access
gcloud secrets add-iam-policy-binding database-password \
  --member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"
# Access secrets in code
from google.cloud import secretmanager

def get_secret(secret_id, project_id, version_id="latest"):
    """Retrieve secret from Secret Manager."""
    client = secretmanager.SecretManagerServiceClient()
    name = f"projects/{project_id}/secrets/{secret_id}/versions/{version_id}"
    response = client.access_secret_version(request={"name": name})
    return response.payload.data.decode('UTF-8')

# Use in agent
db_password = get_secret("database-password", "my-project")

VPC Service Controls

# Create service perimeter for production
gcloud access-context-manager perimeters create adk_perimeter \
  --title="ADK Production Perimeter" \
  --resources=projects/PROJECT_NUMBER \
  --restricted-services=aiplatform.googleapis.com,storage.googleapis.com \
  --vpc-allowed-services=ALLOW_ALL

# This restricts data exfiltration and ensures agents only access authorized services

CI/CD Pipeline

Cloud Build Configuration

# cloudbuild.yaml
steps:
  # Run tests
  - name: 'python:3.11'
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        pip install -r requirements.txt
        pip install pytest pytest-asyncio
        pytest tests/ -v

  # Build container
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'build'
      - '-t'
      - 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
      - '-t'
      - 'gcr.io/$PROJECT_ID/adk-agent:latest'
      - '.'

  # Push to Container Registry
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'push'
      - 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'

  # Deploy to Cloud Run (staging)
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: 'gcloud'
    args:
      - 'run'
      - 'deploy'
      - 'adk-agent-staging'
      - '--image'
      - 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
      - '--region'
      - 'us-central1'
      - '--platform'
      - 'managed'

  # Run integration tests
  - name: 'python:3.11'
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        pip install requests
        python tests/integration_test.py --url=$(gcloud run services describe adk-agent-staging --region us-central1 --format 'value(status.url)')

  # Deploy to production (manual approval required)
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: 'gcloud'
    args:
      - 'run'
      - 'deploy'
      - 'adk-agent'
      - '--image'
      - 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
      - '--region'
      - 'us-central1'
      - '--platform'
      - 'managed'
      - '--tag'
      - 'v$SHORT_SHA'
      - '--no-traffic'  # Canary deployment

images:
  - 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
  - 'gcr.io/$PROJECT_ID/adk-agent:latest'

options:
  machineType: 'N1_HIGHCPU_8'
  logging: CLOUD_LOGGING_ONLY

Canary Deployment Strategy

# Deploy new version with no traffic
gcloud run deploy adk-agent \
  --image gcr.io/PROJECT/adk-agent:v2 \
  --region us-central1 \
  --tag v2 \
  --no-traffic

# Route 10% traffic to canary
gcloud run services update-traffic adk-agent \
  --region us-central1 \
  --to-revisions v2=10,LATEST=90

# Monitor metrics for 1 hour
# If successful, gradually increase:
gcloud run services update-traffic adk-agent \
  --region us-central1 \
  --to-revisions v2=50,LATEST=50

# Full rollout
gcloud run services update-traffic adk-agent \
  --region us-central1 \
  --to-latest

# Rollback if needed
gcloud run services update-traffic adk-agent \
  --region us-central1 \
  --to-revisions LATEST=100

Production Checklist

✅ Pre-Production Checklist

Infrastructure:

  • ☐ Choose deployment option (Cloud Run / Vertex AI / GKE)
  • ☐ Set up separate dev/staging/production environments
  • ☐ Configure autoscaling and resource limits
  • ☐ Set up load balancing and CDN if needed

Security:

  • ☐ Create service accounts with minimum permissions
  • ☐ Store all secrets in Secret Manager
  • ☐ Enable VPC Service Controls for data isolation
  • ☐ Implement authentication (Cloud IAP, API keys, OAuth)
  • ☐ Configure firewalls and network policies

Observability:

  • ☐ Enable Cloud Trace with appropriate sampling
  • ☐ Configure Cloud Logging with log levels
  • ☐ Set up BigQuery export for analytics
  • ☐ Create monitoring dashboards
  • ☐ Configure alerting (latency, errors, costs)

Testing:

  • ☐ Unit tests for all agents and tools
  • ☐ Integration tests for agent interactions
  • ☐ Load testing with expected traffic patterns
  • ☐ Chaos engineering for failure scenarios

CI/CD:

  • ☐ Automated testing in pipeline
  • ☐ Canary deployment strategy
  • ☐ Automated rollback capabilities
  • ☐ Version tagging and artifact management

Cost Optimization

Cloud Run Cost Model

# Estimate Cloud Run costs
def estimate_cloud_run_cost(
    requests_per_month: int,
    avg_response_time_seconds: float,
    memory_gb: float = 2,
    cpu_count: int = 2,
):
    """
    Estimate monthly Cloud Run costs.
    
    Pricing (as of 2025):
    - Requests: $0.40 per million
    - CPU: $0.00002400 per vCPU-second
    - Memory: $0.00000250 per GiB-second
    """
    
    # Request costs
    request_cost = (requests_per_month / 1_000_000) * 0.40
    
    # Compute costs
    total_cpu_seconds = requests_per_month * avg_response_time_seconds
    cpu_cost = total_cpu_seconds * cpu_count * 0.00002400
    
    total_memory_seconds = requests_per_month * avg_response_time_seconds
    memory_cost = total_memory_seconds * memory_gb * 0.00000250
    
    total_monthly = request_cost + cpu_cost + memory_cost
    
    return {
        'requests': request_cost,
        'cpu': cpu_cost,
        'memory': memory_cost,
        'total': total_monthly
    }

# Example calculation
costs = estimate_cloud_run_cost(
    requests_per_month=100_000,
    avg_response_time_seconds=2.0,
    memory_gb=2,
    cpu_count=2,
)

print(f"Estimated monthly cost: ${costs['total']:.2f}")
# Output: Estimated monthly cost: $29.64

Optimization Strategies

  1. Model Selection: Use Gemini 1.5 Flash for simple queries (5x cheaper than Pro)
  2. Caching: Cache frequent queries with Redis/Memorystore
  3. Batch Processing: Process multiple queries in single request
  4. Streaming: Use streaming responses to reduce timeout costs
  5. Smart Routing: Route simple queries to cheaper models

Cost Reference: Cloud Run Pricing | Vertex AI Pricing

Key Takeaways

📚 Coming in Part 5: Advanced Patterns & Case Studies

In the final article, we’ll cover advanced agent patterns and real-world implementations:

  • Advanced Patterns: ReAct, Plan-and-Execute, Reflexion, Tree-of-Thoughts
  • RAG Integration: Vector search with Vertex AI Search
  • Case Study 1: E-commerce order management agent
  • Case Study 2: DevOps incident response system
  • Performance Benchmarks: Latency, cost, and accuracy comparisons
  • Future Roadmap: Gemini 2.0, multi-modal agents, agent marketplaces

Publication Date: June 2025, Week 1

Additional Resources


Ready for advanced patterns and real-world case studies?
Subscribe for Part 5: Advanced Patterns, RAG, and Production Case Studies

Next: Part 5 – Advanced Patterns & Case Studies
Publishing: June 2025, Week 1


Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.