Automated Code Generation with Microsoft AutoGen: Building AI-Powered Development Teams

Executive Summary: Code generation represents one of the most powerful applications of multi-agent AI systems, enabling automated software development workflows that rival human productivity. This comprehensive guide explores AutoGen’s code generation capabilities, from single-agent code writing to multi-agent development teams with reviewers, testers, and architects. After implementing automated coding pipelines for enterprise development teams, I’ve found that well-designed agent workflows can handle complex programming tasks while maintaining code quality through built-in review and testing cycles. Organizations should leverage code generation agents for boilerplate reduction, prototype development, and augmenting developer productivity while implementing proper sandboxing, review processes, and quality gates.

Code Generation Architecture with AutoGen

AutoGen’s code generation architecture combines LLM-powered code writing with automated execution and feedback loops. The AssistantAgent generates code based on requirements, while the UserProxyAgent executes the code and returns results. This tight feedback loop enables iterative refinement—agents can observe execution errors, analyze output, and modify code until requirements are met.

Code execution configuration determines safety and capability boundaries. Docker-based execution provides isolation, preventing generated code from affecting the host system. Local execution offers faster iteration but requires careful permission management. Configure timeout limits to prevent infinite loops, and restrict file system access to designated working directories.

Multi-agent code generation introduces specialized roles: developers write code, reviewers check quality, testers verify functionality, and architects ensure design consistency. This separation of concerns mirrors human development teams, enabling comprehensive code quality without manual intervention. Each agent focuses on its specialty, producing better results than a single agent attempting all tasks.

Building a Code Generation Pipeline

Effective code generation pipelines structure the development process into discrete phases. Requirements analysis extracts specifications from natural language descriptions. Design planning outlines architecture and component structure. Implementation generates actual code. Testing verifies functionality. Review ensures quality standards. Each phase can involve different agents with appropriate expertise.

Prompt engineering for code generation requires precision. Include language specifications, framework requirements, coding standards, and expected output formats. Provide examples of desired code style. Specify error handling expectations, logging requirements, and documentation standards. Well-crafted prompts significantly improve code quality and reduce iteration cycles.

Iterative refinement handles the reality that first-attempt code rarely meets all requirements. Configure agents to analyze execution results, identify issues, and generate fixes. Implement maximum iteration limits to prevent infinite loops on unsolvable problems. Track iteration history to detect patterns indicating fundamental approach problems rather than minor bugs.

Python Implementation: Enterprise Code Generation System

Here’s a comprehensive implementation demonstrating enterprise-grade code generation with AutoGen:

"""Microsoft AutoGen - Enterprise Code Generation System"""
import autogen
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from autogen.coding import LocalCommandLineCodeExecutor, DockerCommandLineCodeExecutor
from typing import Optional, Dict, Any, List, Callable
import os
import tempfile
import logging
from dataclasses import dataclass
from pathlib import Path

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


@dataclass
class CodeGenerationConfig:
    """Configuration for code generation system."""
    model: str = "gpt-4"
    temperature: float = 0.2  # Lower for more deterministic code
    max_tokens: int = 4096
    use_docker: bool = False
    work_dir: str = "./workspace"
    timeout: int = 120
    max_iterations: int = 10


class CodeExecutionEnvironment:
    """Manages code execution environment."""
    
    def __init__(self, config: CodeGenerationConfig):
        self.config = config
        self.work_dir = Path(config.work_dir)
        self.work_dir.mkdir(parents=True, exist_ok=True)
        
        if config.use_docker:
            self.executor = DockerCommandLineCodeExecutor(
                image="python:3.11-slim",
                timeout=config.timeout,
                work_dir=str(self.work_dir),
            )
        else:
            self.executor = LocalCommandLineCodeExecutor(
                timeout=config.timeout,
                work_dir=str(self.work_dir),
            )
    
    def get_execution_config(self) -> Dict[str, Any]:
        """Get code execution configuration for agents."""
        return {
            "executor": self.executor,
            "last_n_messages": 3,
        }


class CodeGenerationTeam:
    """Multi-agent code generation team."""
    
    def __init__(self, config: CodeGenerationConfig):
        self.config = config
        self.env = CodeExecutionEnvironment(config)
        self.agents: Dict[str, autogen.Agent] = {}
        
        self.llm_config = {
            "config_list": [
                {"model": config.model, "api_key": os.getenv("OPENAI_API_KEY")}
            ],
            "temperature": config.temperature,
            "max_tokens": config.max_tokens,
        }
    
    def create_developer(self, specialty: str = "general") -> AssistantAgent:
        """Create a developer agent with specified specialty."""
        
        specialties = {
            "general": "You are an expert software developer proficient in multiple languages.",
            "python": "You are an expert Python developer with deep knowledge of the ecosystem.",
            "frontend": "You are an expert frontend developer skilled in React, Vue, and modern CSS.",
            "backend": "You are an expert backend developer skilled in APIs, databases, and microservices.",
            "data": "You are an expert data engineer skilled in ETL, analytics, and ML pipelines.",
        }
        
        base_prompt = specialties.get(specialty, specialties["general"])
        
        developer = AssistantAgent(
            name=f"developer_{specialty}",
            system_message=f"""{base_prompt}
            
            Your responsibilities:
            1. Write clean, efficient, well-documented code
            2. Follow best practices and coding standards
            3. Include comprehensive error handling
            4. Write self-documenting code with clear variable names
            5. Add type hints for Python code
            6. Include docstrings for functions and classes
            
            When writing code:
            - Always include necessary imports at the top
            - Handle edge cases and potential errors
            - Use meaningful variable and function names
            - Follow PEP 8 style guidelines for Python
            - Include example usage in docstrings
            
            After completing the code, verify it runs without errors.
            When the implementation is complete and tested, say TERMINATE.""",
            llm_config=self.llm_config,
        )
        
        self.agents[f"developer_{specialty}"] = developer
        return developer
    
    def create_code_reviewer(self) -> AssistantAgent:
        """Create a code review agent."""
        
        reviewer = AssistantAgent(
            name="code_reviewer",
            system_message="""You are a senior code reviewer with 20+ years of experience.
            
            Your review checklist:
            1. **Correctness**: Does the code do what it's supposed to do?
            2. **Security**: Are there any security vulnerabilities?
            3. **Performance**: Are there any performance issues or inefficiencies?
            4. **Maintainability**: Is the code easy to understand and modify?
            5. **Error Handling**: Are errors handled appropriately?
            6. **Testing**: Is the code testable? Are edge cases covered?
            7. **Documentation**: Are functions and classes properly documented?
            8. **Style**: Does the code follow consistent style guidelines?
            
            For each issue found:
            - Explain the problem clearly
            - Suggest a specific fix
            - Rate severity (critical/major/minor)
            
            If the code passes review, explicitly approve it.
            When review is complete, summarize findings and say TERMINATE.""",
            llm_config=self.llm_config,
        )
        
        self.agents["code_reviewer"] = reviewer
        return reviewer
    
    def create_test_engineer(self) -> AssistantAgent:
        """Create a test engineering agent."""
        
        tester = AssistantAgent(
            name="test_engineer",
            system_message="""You are an expert test engineer specializing in comprehensive testing.
            
            Your responsibilities:
            1. Write unit tests for all functions and methods
            2. Write integration tests for component interactions
            3. Test edge cases and boundary conditions
            4. Test error handling paths
            5. Ensure high code coverage
            
            Testing standards:
            - Use pytest for Python testing
            - Include positive and negative test cases
            - Test with various input types and sizes
            - Mock external dependencies appropriately
            - Use descriptive test names that explain what's being tested
            
            After writing tests, run them and report results.
            When all tests pass, say TERMINATE.""",
            llm_config=self.llm_config,
        )
        
        self.agents["test_engineer"] = tester
        return tester
    
    def create_architect(self) -> AssistantAgent:
        """Create a software architect agent."""
        
        architect = AssistantAgent(
            name="architect",
            system_message="""You are a senior software architect with expertise in system design.
            
            Your responsibilities:
            1. Review code architecture and design patterns
            2. Ensure scalability and maintainability
            3. Verify separation of concerns
            4. Check for proper abstraction levels
            5. Evaluate extensibility for future requirements
            
            Design principles to enforce:
            - SOLID principles
            - DRY (Don't Repeat Yourself)
            - KISS (Keep It Simple, Stupid)
            - Proper dependency injection
            - Clear module boundaries
            
            Provide architectural feedback and suggestions.
            When design is approved, say TERMINATE.""",
            llm_config=self.llm_config,
        )
        
        self.agents["architect"] = architect
        return architect
    
    def create_executor(self) -> UserProxyAgent:
        """Create a code execution agent."""
        
        executor = UserProxyAgent(
            name="executor",
            human_input_mode="NEVER",
            max_consecutive_auto_reply=self.config.max_iterations,
            is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""),
            code_execution_config=self.env.get_execution_config(),
        )
        
        self.agents["executor"] = executor
        return executor
    
    def create_development_team(self) -> tuple[GroupChat, GroupChatManager]:
        """Create a full development team with all roles."""
        
        developer = self.create_developer("python")
        reviewer = self.create_code_reviewer()
        tester = self.create_test_engineer()
        architect = self.create_architect()
        executor = self.create_executor()
        
        agents = [executor, developer, reviewer, tester, architect]
        
        group_chat = GroupChat(
            agents=agents,
            messages=[],
            max_round=30,
            speaker_selection_method="auto",
            allow_repeat_speaker=True,
        )
        
        manager = GroupChatManager(
            groupchat=group_chat,
            llm_config=self.llm_config,
        )
        
        return group_chat, manager
    
    def generate_code(
        self,
        requirements: str,
        include_tests: bool = True,
        include_review: bool = True
    ) -> Dict[str, Any]:
        """Generate code based on requirements."""
        
        if include_review:
            group_chat, manager = self.create_development_team()
            
            task = f"""
            Requirements:
            {requirements}
            
            Workflow:
            1. Developer: Implement the solution
            2. Executor: Run the code and verify it works
            3. Test Engineer: Write and run comprehensive tests
            4. Code Reviewer: Review code quality and suggest improvements
            5. Architect: Verify design and architecture
            
            Iterate until all agents approve the implementation.
            """
            
            executor = self.agents["executor"]
            result = executor.initiate_chat(manager, message=task)
            
        else:
            developer = self.create_developer("python")
            executor = self.create_executor()
            
            task = f"""
            Requirements:
            {requirements}
            
            Implement the solution and verify it runs correctly.
            Include error handling and documentation.
            """
            
            result = executor.initiate_chat(developer, message=task)
        
        return {
            "chat_history": result.chat_history,
            "work_dir": str(self.env.work_dir),
            "files": list(self.env.work_dir.glob("*")),
        }


# ==================== Specialized Code Generators ====================

class APIGenerator:
    """Specialized generator for REST APIs."""
    
    def __init__(self, config: CodeGenerationConfig):
        self.team = CodeGenerationTeam(config)
    
    def generate_fastapi_endpoint(
        self,
        endpoint_spec: Dict[str, Any]
    ) -> Dict[str, Any]:
        """Generate a FastAPI endpoint from specification."""
        
        requirements = f"""
        Create a FastAPI endpoint with the following specification:
        
        Path: {endpoint_spec.get('path', '/api/resource')}
        Method: {endpoint_spec.get('method', 'GET')}
        Description: {endpoint_spec.get('description', 'API endpoint')}
        
        Request Schema:
        {endpoint_spec.get('request_schema', 'None')}
        
        Response Schema:
        {endpoint_spec.get('response_schema', 'JSON response')}
        
        Requirements:
        - Use Pydantic models for request/response validation
        - Include proper error handling with HTTPException
        - Add OpenAPI documentation with examples
        - Include input validation
        - Return appropriate status codes
        
        Also create:
        - Unit tests using pytest and httpx
        - Example curl commands for testing
        """
        
        return self.team.generate_code(requirements)


class DataPipelineGenerator:
    """Specialized generator for data pipelines."""
    
    def __init__(self, config: CodeGenerationConfig):
        self.team = CodeGenerationTeam(config)
    
    def generate_etl_pipeline(
        self,
        source: str,
        transformations: List[str],
        destination: str
    ) -> Dict[str, Any]:
        """Generate an ETL pipeline."""
        
        requirements = f"""
        Create a data ETL pipeline with the following specification:
        
        Source: {source}
        Transformations: {', '.join(transformations)}
        Destination: {destination}
        
        Requirements:
        - Use pandas for data manipulation
        - Include data validation at each stage
        - Add logging for pipeline progress
        - Handle errors gracefully with retry logic
        - Support incremental processing
        - Include data quality checks
        
        Also create:
        - Configuration file for pipeline parameters
        - Unit tests with sample data
        - Documentation for running the pipeline
        """
        
        return self.team.generate_code(requirements)


# ==================== Example Usage ====================

def example_generate_utility():
    """Example: Generate a utility function."""
    config = CodeGenerationConfig(
        model="gpt-4",
        temperature=0.2,
        use_docker=False,
        work_dir="./code_workspace"
    )
    
    team = CodeGenerationTeam(config)
    
    requirements = """
    Create a Python utility class for rate limiting with the following features:
    
    1. Token bucket algorithm implementation
    2. Thread-safe operation using threading locks
    3. Configurable rate (tokens per second) and bucket size
    4. Support for both blocking and non-blocking modes
    5. Decorator for easy function rate limiting
    6. Async support for asyncio applications
    
    Include:
    - Comprehensive docstrings with examples
    - Type hints for all methods
    - Unit tests covering all functionality
    - Usage examples in the docstring
    """
    
    result = team.generate_code(requirements, include_tests=True, include_review=True)
    
    print(f"Generated files: {result['files']}")
    return result


def example_generate_api():
    """Example: Generate a REST API endpoint."""
    config = CodeGenerationConfig(model="gpt-4", temperature=0.2)
    
    generator = APIGenerator(config)
    
    endpoint_spec = {
        "path": "/api/users/{user_id}",
        "method": "PUT",
        "description": "Update user profile information",
        "request_schema": {
            "name": "string (optional)",
            "email": "string (optional, must be valid email)",
            "preferences": "object (optional)"
        },
        "response_schema": {
            "id": "integer",
            "name": "string",
            "email": "string",
            "updated_at": "datetime"
        }
    }
    
    result = generator.generate_fastapi_endpoint(endpoint_spec)
    return result


if __name__ == "__main__":
    print("Running code generation example...")
    result = example_generate_utility()
    
    print("\nChat History Summary:")
    for msg in result["chat_history"][-5:]:
        print(f"\n[{msg.get('name', 'Unknown')}]: {msg.get('content', '')[:300]}...")

Code Quality and Security Considerations

Generated code requires security review before production deployment. LLMs may produce code with subtle vulnerabilities—SQL injection, path traversal, or insecure defaults. Implement automated security scanning (Bandit for Python, ESLint security plugins for JavaScript) as part of the generation pipeline. Never deploy generated code without human review for security-critical applications.

Code execution sandboxing prevents generated code from causing system damage. Docker containers provide strong isolation but add overhead. Local execution with restricted permissions offers a middle ground. Configure file system access limits, network restrictions, and resource quotas. Monitor execution for suspicious behavior patterns.

Quality gates ensure generated code meets standards before acceptance. Implement linting checks, type checking (mypy for Python), and test coverage requirements. Reject code that fails quality gates and trigger regeneration with feedback about failures. This automated quality enforcement improves output consistency.

AutoGen Code Generation Pipeline - showing developer agents, code execution, review cycles, and quality gates — AutoGen Code Generation Architecture – Illustrating multi-agent development teams, code execution sandboxing, iterative refinement cycles, and quality gate enforcement.

Key Takeaways and Best Practices

Code generation with AutoGen enables powerful automated development workflows when properly configured. Use multi-agent teams with specialized roles for complex projects. Implement comprehensive sandboxing for code execution. Enforce quality gates through automated testing and review.

The Python examples provided here establish patterns for enterprise-grade code generation. Start with simple single-agent generation for prototyping, then scale to full development teams for production code. In the next article, we’ll explore integrating Retrieval-Augmented Generation (RAG) with AutoGen for knowledge-enhanced agent capabilities.

Discover more from Code, Cloud & Context

Subscribe to get the latest posts sent to your email.