π Microsoft AutoGen Series
With RAG-enhanced agents from Part 4, we now deploy multi-agent systems to production using Kubernetes.
1. The Multi-Agent Scaling Challenge
Single-agent systems are straightforward to deploy. Multi-agent systems introduce complexity:
- State management: Agent conversations span multiple requests
- Communication patterns: Agents need reliable pub/sub messaging
- Resource allocation: Different agents have different compute needs
- Error handling: One agent failure shouldn’t crash the entire system
- Monitoring: Debugging distributed agent conversations is hard
flowchart TB
subgraph K8s[Kubernetes Cluster]
ING[Ingress] --> API[API Gateway]
API --> ORCH[Orchestrator]
ORCH --> AG1[Agent Pod 1]
ORCH --> AG2[Agent Pod 2]
ORCH --> AG3[Agent Pod 3]
AG1 & AG2 & AG3 --> REDIS[(Redis)]
AG1 & AG2 & AG3 --> KAFKA[(Kafka)]
end
PROM[Prometheus] --> AG1 & AG2 & AG3
style ORCH fill:#667eea,color:white
style REDIS fill:#ed8936,color:whiteFigure 1: Multi-Agent Kubernetes Architecture
2. Kubernetes Architecture
A production multi-agent architecture requires several key components working together:
| Component | Purpose | Technology |
|---|---|---|
| API Gateway | Request routing, rate limiting, auth | Kong, Ambassador, NGINX |
| Orchestrator | Agent coordination, task distribution | Custom Python service |
| State Store | Conversation persistence, checkpoints | Redis Cluster |
| Message Queue | Async agent communication | Kafka, RabbitMQ |
| Monitoring | Metrics, tracing, alerting | Prometheus + Grafana |
3. Kubernetes Deployment Configuration
# autogen-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: autogen-orchestrator
labels:
app: autogen
component: orchestrator
spec:
replicas: 3
selector:
matchLabels:
app: autogen
component: orchestrator
template:
metadata:
labels:
app: autogen
component: orchestrator
spec:
containers:
- name: orchestrator
image: autogen-orchestrator:v1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
env:
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: redis-secret
key: url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-secret
key: api-key
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: autogen-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: autogen-orchestrator
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
4. State Management with Redis
"""Redis-based state management for AutoGen."""
import redis
import json
from typing import Dict, Any, Optional
from dataclasses import dataclass, asdict
from datetime import datetime, timedelta
@dataclass
class ConversationState:
conversation_id: str
agents: list
messages: list
status: str = "active"
created_at: str = None
updated_at: str = None
checkpoint_id: Optional[str] = None
def __post_init__(self):
now = datetime.utcnow().isoformat()
if not self.created_at:
self.created_at = now
self.updated_at = now
class RedisStateManager:
"""Distributed state management for multi-agent systems."""
def __init__(self, redis_url: str, ttl_hours: int = 24):
self.redis = redis.from_url(redis_url)
self.ttl = timedelta(hours=ttl_hours)
def save_state(self, state: ConversationState) -> None:
"""Save conversation state."""
key = f"conversation:{state.conversation_id}"
state.updated_at = datetime.utcnow().isoformat()
self.redis.setex(
key,
self.ttl,
json.dumps(asdict(state))
)
def get_state(self, conversation_id: str) -> Optional[ConversationState]:
"""Retrieve conversation state."""
key = f"conversation:{conversation_id}"
data = self.redis.get(key)
if data:
return ConversationState(**json.loads(data))
return None
def create_checkpoint(self, conversation_id: str) -> str:
"""Create a checkpoint for recovery."""
state = self.get_state(conversation_id)
if not state:
raise ValueError(f"Conversation {conversation_id} not found")
checkpoint_id = f"{conversation_id}:{datetime.utcnow().timestamp()}"
checkpoint_key = f"checkpoint:{checkpoint_id}"
self.redis.setex(
checkpoint_key,
timedelta(hours=72), # Longer TTL for checkpoints
json.dumps(asdict(state))
)
state.checkpoint_id = checkpoint_id
self.save_state(state)
return checkpoint_id
def restore_checkpoint(self, checkpoint_id: str) -> ConversationState:
"""Restore from checkpoint."""
key = f"checkpoint:{checkpoint_id}"
data = self.redis.get(key)
if not data:
raise ValueError(f"Checkpoint {checkpoint_id} not found")
state = ConversationState(**json.loads(data))
state.status = "restored"
self.save_state(state)
return state
def add_message(self, conversation_id: str, message: Dict[str, Any]) -> None:
"""Add message to conversation."""
state = self.get_state(conversation_id)
if state:
state.messages.append(message)
self.save_state(state)
5. Monitoring and Observability
Effective monitoring is critical for production multi-agent systems:
- Prometheus metrics: Agent latency, token usage, error rates
- Distributed tracing: Track conversations across agents with OpenTelemetry
- Log aggregation: Centralize logs with conversation correlation IDs
- Alerting: Token budget alerts, high latency, error thresholds
Conclusion
Deploying multi-agent systems to production requires careful attention to state management, scaling, and observability. Kubernetes provides the orchestration layer, but the real challenges are in managing distributed agent conversations and ensuring reliability.
π Key Takeaways
- Redis for distributed state management across pods
- Kafka/RabbitMQ for async agent communication
- HPA for automatic scaling based on CPU/memory
- Checkpointing enables conversation recovery
- Prometheus + Grafana for comprehensive observability
π Coming Up: Part 6: Advanced Patterns
We’ll explore hierarchical team structures, workflow state machines, and enterprise system integration.
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.