Series Navigation:
Part
1: Introduction | Part
2: Tools & Memory | Part 3:
Multi-Agent Systems | Part 4: Production Deployment | Part 5: Advanced Patterns (Coming
Soon)
You’ve built sophisticated multi-agent systems. Now it’s
time to deploy them to production. This article provides reference documentation for deploying ADK agents on Google
Cloud, covering deployment options, observability, security, and CI/CD workflows.
We’ll compare Cloud Run, Vertex AI Agent Engine, and GKE
deployments, implement comprehensive monitoring with Cloud Trace and BigQuery, and set up production-grade CI/CD
pipelines.
Deployment Options Overview
Google Cloud offers three primary deployment paths for ADK agents:
| Option | Best For | Complexity | Cost | Scalability |
|---|---|---|---|---|
| Cloud Run | REST APIs, webhooks, serverless workloads | ⭐ Low | 💰 Pay per use | 0-1000 instances |
| Vertex AI Agent Engine | Complex agents, managed orchestration | ⭐⭐ Medium | 💰💰 Managed service | Fully managed |
| GKE (Kubernetes) | Enterprise, multi-tenant, custom networking | ⭐⭐⭐ High | 💰💰💰 Reserved resources | Unlimited |
- Choose Cloud Run for REST APIs, low-medium traffic (<10K req/day), simple architectures
- Choose Vertex AI Agent Engine for complex multi-agent systems, managed orchestration,
native Gemini features - Choose GKE for enterprise scale (>100K req/day), custom networking, multi-tenancy,
regulatory requirements
Option 1: Cloud Run Deployment
Cloud Run is Google’s serverless container platform. It’s ideal for stateless agents serving REST
APIs.
Step 1: Dockerize Your Agent
# Dockerfile for ADK Agent
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY agents/ ./agents/
COPY tools/ ./tools/
COPY config/ ./config/
COPY main.py .
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
# Run the application
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
# requirements.txt
google-adk-core==1.0.0
google-adk-vertexai==1.0.0
google-cloud-trace==1.11.0
google-cloud-logging==3.8.0
flask==3.0.0
gunicorn==21.2.0
asyncio==3.4.3
Step 2: Create Flask API Wrapper
"""
main.py
Flask API wrapper for ADK agent deployment on Cloud Run
"""
from flask import Flask, request, jsonify
from google.cloud import logging as cloud_logging
import asyncio
from agents.search_agent import SearchAssistant
import os
# Initialize Cloud Logging
logging_client = cloud_logging.Client()
logging_client.setup_logging()
app = Flask(__name__)
# Initialize agent (singleton)
agent = None
def get_agent():
"""Lazy initialization of agent."""
global agent
if agent is None:
config_path = os.getenv('AGENT_CONFIG', 'config/production.yaml')
agent = SearchAssistant(config_path=config_path)
return agent
@app.route('/health', methods=['GET'])
def health_check():
"""Health check endpoint for Cloud Run."""
return jsonify({
'status': 'healthy',
'service': 'adk-agent',
'version': '1.0.0'
}), 200
@app.route('/agent/query', methods=['POST'])
async def agent_query():
"""
Main agent endpoint.
Request:
{
"query": "What is Google ADK?",
"context": {"user_id": "123"},
"session_id": "abc-123"
}
Response:
{
"answer": "...",
"metadata": {...}
}
"""
try:
data = request.get_json()
if not data or 'query' not in data:
return jsonify({'error': 'Missing query parameter'}), 400
query = data['query']
context = data.get('context', {})
session_id = data.get('session_id')
# Get agent instance
agent_instance = get_agent()
# Process query (async)
answer = await agent_instance.ask(
question=query,
context=context
)
return jsonify({
'answer': answer,
'session_id': session_id,
'metadata': {
'model': 'gemini-1.5-pro-002',
'timestamp': datetime.utcnow().isoformat()
}
}), 200
except Exception as e:
app.logger.error(f"Agent query error: {str(e)}")
return jsonify({'error': 'Internal server error'}), 500
@app.route('/agent/batch', methods=['POST'])
async def agent_batch():
"""
Batch processing endpoint.
Request:
{
"queries": ["query1", "query2", "query3"]
}
Response:
{
"results": [
{"query": "query1", "answer": "..."},
...
]
}
"""
try:
data = request.get_json()
queries = data.get('queries', [])
if not queries:
return jsonify({'error': 'No queries provided'}), 400
agent_instance = get_agent()
# Process in parallel
answers = await agent_instance.ask_batch(queries)
results = [
{'query': q, 'answer': a}
for q, a in zip(queries, answers)
]
return jsonify({'results': results}), 200
except Exception as e:
app.logger.error(f"Batch processing error: {str(e)}")
return jsonify({'error': 'Internal server error'}), 500
if __name__ == '__main__':
port = int(os.environ.get('PORT', 8080))
app.run(host='0.0.0.0', port=port)
Step 3: Deploy to Cloud Run
# Build and push container
gcloud builds submit --tag gcr.io/PROJECT_ID/adk-agent
# Deploy to Cloud Run
gcloud run deploy adk-agent \
--image gcr.io/PROJECT_ID/adk-agent \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars "GOOGLE_CLOUD_PROJECT=PROJECT_ID" \
--memory 2Gi \
--cpu 2 \
--timeout 300 \
--concurrency 80 \
--min-instances 0 \
--max-instances 100
# Get service URL
gcloud run services describe adk-agent \
--region us-central1 \
--format 'value(status.url)'
- Concurrency: Set to 80-100 for CPU-intensive agents, 200-300 for I/O-bound
- Memory: 2Gi minimum for agents with memory/vector stores
- Timeout: 300s max (5 min) – optimize agents to respond faster
- Min instances: Set to 1-3 for production to avoid cold starts
- CPU Allocation: Use “CPU always allocated” for consistent performance
Reference: Cloud Run Autoscaling
Cloud Run with Cloud Load Balancing
For production traffic management:
# Create backend service
gcloud compute backend-services create adk-backend \
--global \
--load-balancing-scheme=EXTERNAL_MANAGED
# Add Cloud Run NEG
gcloud compute network-endpoint-groups create adk-neg \
--region=us-central1 \
--network-endpoint-type=SERVERLESS \
--cloud-run-service=adk-agent
gcloud compute backend-services add-backend adk-backend \
--global \
--network-endpoint-group=adk-neg \
--network-endpoint-group-region=us-central1
# Create URL map and load balancer
gcloud compute url-maps create adk-lb \
--default-service adk-backend
gcloud compute target-https-proxies create adk-https-proxy \
--url-map adk-lb
# Reserve IP and create forwarding rule
gcloud compute addresses create adk-ip --global
gcloud compute forwarding-rules create adk-https-rule \
--global \
--target-https-proxy=adk-https-proxy \
--address=adk-ip \
--ports=443
Option 2: Vertex AI Agent Engine Deployment
Vertex AI Agent Engine provides managed orchestration for complex multi-agent systems.
Define Agent Configuration
# agent_config.yaml for Vertex AI Agent Engine
apiVersion: aiplatform.googleapis.com/v1
kind: Agent
metadata:
name: research-assistant
project: my-project
location: us-central1
spec:
# Agent definition
displayName: "Research Assistant Agent"
description: "Multi-agent research assistant with search and synthesis"
# Model configuration
model:
name: "gemini-1.5-pro-002"
parameters:
temperature: 0.7
topP: 0.95
maxOutputTokens: 2048
# Tools
tools:
- name: "google_search"
type: "GOOGLE_SEARCH"
config:
maxResults: 10
safeSearch: true
- name: "custom_database"
type: "FUNCTION"
function:
name: "query_database"
description: "Query PostgreSQL database"
parameters:
type: "object"
properties:
query:
type: "string"
description: "SQL SELECT query"
required: ["query"]
serviceAccount: "agent-sa@project.iam.gserviceaccount.com"
endpoint: "https://db-function-url.run.app"
# Memory configuration
memory:
conversationBuffer:
maxMessages: 20
vectorStore:
enabled: true
indexEndpoint: "projects/PROJECT/locations/us-central1/indexEndpoints/INDEX_ID"
embeddingModel: "textembedding-gecko@003"
# Safety settings
safetySettings:
- category: "HARM_CATEGORY_HATE_SPEECH"
threshold: "BLOCK_MEDIUM_AND_ABOVE"
- category: "HARM_CATEGORY_DANGEROUS_CONTENT"
threshold: "BLOCK_MEDIUM_AND_ABOVE"
# Observability
observability:
cloudTrace:
enabled: true
cloudLogging:
enabled: true
logLevel: "INFO"
bigQueryExport:
enabled: true
dataset: "agent_analytics"
table: "agent_requests"
Deploy Agent to Vertex AI
# Deploy agent
gcloud ai agents deploy research-assistant \
--config agent_config.yaml \
--region us-central1 \
--project my-project
# Get agent endpoint
gcloud ai agents describe research-assistant \
--region us-central1 \
--format 'value(endpoint)'
# Test agent
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the latest developments in multi-agent systems?",
"sessionId": "test-session-123"
}' \
https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/REGION/agents/research-assistant:query
Reference: Vertex AI
Agent Engine Documentation
Option 3: GKE (Kubernetes) Deployment
For enterprise-scale deployments with custom requirements.
Kubernetes Deployment Manifest
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: adk-agent
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: adk-agent
template:
metadata:
labels:
app: adk-agent
version: v1.0.0
spec:
serviceAccountName: adk-agent-sa
containers:
- name: agent
image: gcr.io/PROJECT/adk-agent:v1.0.0
ports:
- containerPort: 8080
name: http
env:
- name: GOOGLE_CLOUD_PROJECT
value: "my-project"
- name: AGENT_CONFIG
value: "/config/production.yaml"
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
volumeMounts:
- name: config
mountPath: /config
readOnly: true
- name: secrets
mountPath: /secrets
readOnly: true
volumes:
- name: config
configMap:
name: agent-config
- name: secrets
secret:
secretName: agent-secrets
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: adk-agent
namespace: production
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: adk-agent
---
# hpa.yaml - Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: adk-agent-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: adk-agent
minReplicas: 3
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
Deploy to GKE
# Create GKE cluster
gcloud container clusters create adk-cluster \
--region us-central1 \
--num-nodes 3 \
--machine-type n2-standard-4 \
--enable-autoscaling \
--min-nodes 3 \
--max-nodes 20 \
--enable-stackdriver-kubernetes \
--enable-cloud-logging \
--enable-cloud-monitoring \
--workload-pool=PROJECT.svc.id.goog
# Get credentials
gcloud container clusters get-credentials adk-cluster \
--region us-central1
# Create namespace
kubectl create namespace production
# Apply manifests
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f hpa.yaml
# Check deployment
kubectl get pods -n production
kubectl get hpa -n production
Observability and Monitoring
Cloud Trace Integration
ADK automatically traces all agent operations when enabled:
"""
Enable Cloud Trace in agent configuration
"""
from google.cloud import trace_v2
from adk import Agent, AgentConfig
from adk.observability import enable_tracing
# Enable tracing
enable_tracing(
project_id='my-project',
sampling_rate=1.0, # 100% sampling for development, 0.1 for production
)
# Agent will automatically trace:
# - LLM calls
# - Tool executions
# - Memory operations
# - Multi-agent communication
agent = Agent(
config=AgentConfig(
name="traced-agent",
observability={
'cloud_trace': True,
'sampling_rate': 0.1, # 10% in production
}
)
)
Custom Trace Spans
from adk.observability import trace_span
class CustomAgent:
@trace_span(name="custom_processing")
async def process(self, data):
"""Custom processing with automatic tracing."""
# Your logic here
result = await self.analyze(data)
return result
@trace_span(name="analysis", attributes={"model": "gemini-1.5-pro"})
async def analyze(self, data):
"""Analysis step with custom attributes."""
# Traced automatically
response = await self.agent.run(data)
return response
BigQuery Analytics Export
"""
Export agent metrics to BigQuery for analysis
"""
from adk.observability import BigQueryExporter
# Configure BigQuery export
exporter = BigQueryExporter(
project_id ='my-project',
dataset_id='agent_analytics',
table_id='agent_requests',
schema=[
{'name': 'timestamp', 'type': 'TIMESTAMP'},
{'name': 'agent_name', 'type': 'STRING'},
{'name': 'session_id', 'type': 'STRING'},
{'name': 'query', 'type': 'STRING'},
{'name': 'response_time_ms', 'type': 'FLOAT'},
{'name': 'token_count', 'type': 'INTEGER'},
{'name': 'tools_used', 'type': 'STRING', 'mode': 'REPEATED'},
{'name': 'success', 'type': 'BOOLEAN'},
{'name': 'error_message', 'type': 'STRING'},
]
)
# Attach to agent
agent = Agent(
config=agent_config,
observability_exporter=exporter,
)
# Metrics are automatically exported
# Query in BigQuery:
"""
SELECT
agent_name,
AVG(response_time_ms) as avg_response_time,
COUNT(*) as total_requests,
COUNTIF(success) as successful_requests,
ARRAY_AGG(DISTINCT tool IGNORE NULLS) as tools_used
FROM `project.agent_analytics.agent_requests`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY agent_name
"""
Cloud Monitoring Dashboards
# Create monitoring dashboard
cat > dashboard.json <<EOF
{
"displayName": "ADK Agent Monitoring",
"dashboardFilters": [],
"gridLayout": {
"widgets": [
{
"title": "Request Rate",
"xyChart": {
"dataSets": [{
"timeSeriesQuery": {
"timeSeriesFilter": {
"filter": "resource.type=\"cloud_run_revision\" resource.labels.service_name=\"adk-agent\"",
"aggregation": {
"alignmentPeriod": "60s",
"perSeriesAligner": "ALIGN_RATE"
}
}
}
}]
}
},
{
"title": "Response Latency (p50, p95, p99)",
"xyChart": {
"dataSets": [{
"timeSeriesQuery": {
"timeSeriesFilter": {
"filter": "resource.type=\"cloud_run_revision\" metric.type=\"run.googleapis.com/request_latencies\"",
"aggregation": {
"alignmentPeriod": "60s",
"crossSeriesReducer": "REDUCE_PERCENTILE_50"
}
}
}
}]
}
}
]
}
}
EOF
gcloud monitoring dashboards create --config-from-file=dashboard.json
Monitoring Reference: Cloud Monitoring
Documentation
Security Best Practices
Service Account Setup
# Create service account
gcloud iam service-accounts create adk-agent \
--display-name="ADK Agent Service Account"
# Grant minimum required permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/cloudtrace.agent"
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/logging.logWriter"
# For secret access
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Secret Manager Integration
# Store API keys in Secret Manager
echo -n "my-api-key" | gcloud secrets create database-password \
--data-file=- \
--replication-policy="automatic"
# Grant access
gcloud secrets add-iam-policy-binding database-password \
--member="serviceAccount:adk-agent@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
# Access secrets in code
from google.cloud import secretmanager
def get_secret(secret_id, project_id, version_id="latest"):
"""Retrieve secret from Secret Manager."""
client = secretmanager.SecretManagerServiceClient()
name = f"projects/{project_id}/secrets/{secret_id}/versions/{version_id}"
response = client.access_secret_version(request={"name": name})
return response.payload.data.decode('UTF-8')
# Use in agent
db_password = get_secret("database-password", "my-project")
VPC Service Controls
# Create service perimeter for production
gcloud access-context-manager perimeters create adk_perimeter \
--title="ADK Production Perimeter" \
--resources=projects/PROJECT_NUMBER \
--restricted-services=aiplatform.googleapis.com,storage.googleapis.com \
--vpc-allowed-services=ALLOW_ALL
# This restricts data exfiltration and ensures agents only access authorized services
CI/CD Pipeline
Cloud Build Configuration
# cloudbuild.yaml
steps:
# Run tests
- name: 'python:3.11'
entrypoint: 'bash'
args:
- '-c'
- |
pip install -r requirements.txt
pip install pytest pytest-asyncio
pytest tests/ -v
# Build container
- name: 'gcr.io/cloud-builders/docker'
args:
- 'build'
- '-t'
- 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
- '-t'
- 'gcr.io/$PROJECT_ID/adk-agent:latest'
- '.'
# Push to Container Registry
- name: 'gcr.io/cloud-builders/docker'
args:
- 'push'
- 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
# Deploy to Cloud Run (staging)
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: 'gcloud'
args:
- 'run'
- 'deploy'
- 'adk-agent-staging'
- '--image'
- 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
- '--region'
- 'us-central1'
- '--platform'
- 'managed'
# Run integration tests
- name: 'python:3.11'
entrypoint: 'bash'
args:
- '-c'
- |
pip install requests
python tests/integration_test.py --url=$(gcloud run services describe adk-agent-staging --region us-central1 --format 'value(status.url)')
# Deploy to production (manual approval required)
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: 'gcloud'
args:
- 'run'
- 'deploy'
- 'adk-agent'
- '--image'
- 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
- '--region'
- 'us-central1'
- '--platform'
- 'managed'
- '--tag'
- 'v$SHORT_SHA'
- '--no-traffic' # Canary deployment
images:
- 'gcr.io/$PROJECT_ID/adk-agent:$SHORT_SHA'
- 'gcr.io/$PROJECT_ID/adk-agent:latest'
options:
machineType: 'N1_HIGHCPU_8'
logging: CLOUD_LOGGING_ONLY
Canary Deployment Strategy
# Deploy new version with no traffic
gcloud run deploy adk-agent \
--image gcr.io/PROJECT/adk-agent:v2 \
--region us-central1 \
--tag v2 \
--no-traffic
# Route 10% traffic to canary
gcloud run services update-traffic adk-agent \
--region us-central1 \
--to-revisions v2=10,LATEST=90
# Monitor metrics for 1 hour
# If successful, gradually increase:
gcloud run services update-traffic adk-agent \
--region us-central1 \
--to-revisions v2=50,LATEST=50
# Full rollout
gcloud run services update-traffic adk-agent \
--region us-central1 \
--to-latest
# Rollback if needed
gcloud run services update-traffic adk-agent \
--region us-central1 \
--to-revisions LATEST=100
Production Checklist
✅ Pre-Production Checklist
Infrastructure:
- ☐ Choose deployment option (Cloud Run / Vertex AI / GKE)
- ☐ Set up separate dev/staging/production environments
- ☐ Configure autoscaling and resource limits
- ☐ Set up load balancing and CDN if needed
Security:
- ☐ Create service accounts with minimum permissions
- ☐ Store all secrets in Secret Manager
- ☐ Enable VPC Service Controls for data isolation
- ☐ Implement authentication (Cloud IAP, API keys, OAuth)
- ☐ Configure firewalls and network policies
Observability:
- ☐ Enable Cloud Trace with appropriate sampling
- ☐ Configure Cloud Logging with log levels
- ☐ Set up BigQuery export for analytics
- ☐ Create monitoring dashboards
- ☐ Configure alerting (latency, errors, costs)
Testing:
- ☐ Unit tests for all agents and tools
- ☐ Integration tests for agent interactions
- ☐ Load testing with expected traffic patterns
- ☐ Chaos engineering for failure scenarios
CI/CD:
- ☐ Automated testing in pipeline
- ☐ Canary deployment strategy
- ☐ Automated rollback capabilities
- ☐ Version tagging and artifact management
Cost Optimization
Cloud Run Cost Model
# Estimate Cloud Run costs
def estimate_cloud_run_cost(
requests_per_month: int,
avg_response_time_seconds: float,
memory_gb: float = 2,
cpu_count: int = 2,
):
"""
Estimate monthly Cloud Run costs.
Pricing (as of 2025):
- Requests: $0.40 per million
- CPU: $0.00002400 per vCPU-second
- Memory: $0.00000250 per GiB-second
"""
# Request costs
request_cost = (requests_per_month / 1_000_000) * 0.40
# Compute costs
total_cpu_seconds = requests_per_month * avg_response_time_seconds
cpu_cost = total_cpu_seconds * cpu_count * 0.00002400
total_memory_seconds = requests_per_month * avg_response_time_seconds
memory_cost = total_memory_seconds * memory_gb * 0.00000250
total_monthly = request_cost + cpu_cost + memory_cost
return {
'requests': request_cost,
'cpu': cpu_cost,
'memory': memory_cost,
'total': total_monthly
}
# Example calculation
costs = estimate_cloud_run_cost(
requests_per_month=100_000,
avg_response_time_seconds=2.0,
memory_gb=2,
cpu_count=2,
)
print(f"Estimated monthly cost: ${costs['total']:.2f}")
# Output: Estimated monthly cost: $29.64
Optimization Strategies
- Model Selection: Use Gemini 1.5 Flash for simple queries (5x cheaper than Pro)
- Caching: Cache frequent queries with Redis/Memorystore
- Batch Processing: Process multiple queries in single request
- Streaming: Use streaming responses to reduce timeout costs
- Smart Routing: Route simple queries to cheaper models
Cost Reference: Cloud Run Pricing | Vertex AI Pricing
Key Takeaways
Additional Resources
📖 Reference Links
Deployment:
Observability:
Security:
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.