Automated Code Generation with Microsoft AutoGen: Building AI-Powered Development Teams

📖 Part 3 of 6 | Microsoft AutoGen: Building Multi-Agent AI Systems

📚 Microsoft AutoGen Series

Building on communication patterns from Part 2, we now apply them to automated code generation—one of the most powerful applications of multi-agent systems.

ℹ️ INFO

AutoGen’s code generation combines LLM-powered code writing with automated execution and feedback loops. Agents observe execution errors, analyze output, and modify code until requirements are met.

Code Generation Architecture with AutoGen

AutoGen’s code generation architecture combines LLM-powered code writing with automated execution and feedback loops. The AssistantAgent generates code based on requirements, while the UserProxyAgent executes the code and returns results. This tight feedback loop enables iterative refinement—agents can observe execution errors, analyze output, and modify code until requirements are met.

Code execution configuration determines safety and capability boundaries. Docker-based execution provides isolation, preventing generated code from affecting the host system. Local execution offers faster iteration but requires careful permission management. Configure timeout limits to prevent infinite loops, and restrict file system access to designated working directories.

Multi-agent code generation introduces specialized roles: developers write code, reviewers check quality, testers verify functionality, and architects ensure design consistency. This separation of concerns mirrors human development teams, enabling comprehensive code quality without manual intervention.

flowchart TB
    REQ[Requirements] --> PLAN[Planning Agent]
    PLAN --> DEV[Developer Agent]
    DEV --> EXEC[Code Executor]
    EXEC --> REV[Review Agent]
    REV -->|Issues| DEV
    REV -->|OK| TEST[Test Agent]
    TEST -->|Fail| DEV
    TEST -->|Pass| OUT[Final Code]
    style PLAN fill:#667eea,color:white
    style DEV fill:#48bb78,color:white
    style REV fill:#ed8936,color:white

Figure 1: Code Generation Pipeline with Iterative Refinement

Building a Code Generation Pipeline

Effective code generation pipelines structure the development process into discrete phases. Requirements analysis extracts specifications from natural language descriptions. Design planning outlines architecture and component structure. Implementation generates actual code. Testing verifies functionality. Review ensures quality standards. Each phase can involve different agents with appropriate expertise.

Prompt engineering for code generation requires precision. Include language specifications, framework requirements, coding standards, and expected output formats. Provide examples of desired code style. Specify error handling expectations, logging requirements, and documentation standards. Well-crafted prompts significantly improve code quality and reduce iteration cycles.

Iterative refinement handles the reality that first-attempt code rarely meets all requirements. Configure agents to analyze execution results, identify issues, and generate fixes. Implement maximum iteration limits to prevent infinite loops on unsolvable problems. Track iteration history to detect patterns indicating fundamental approach problems rather than minor bugs.

⚠️ WARNING

Always use Docker for code execution in production. Generated code may contain bugs or security vulnerabilities that could affect your host system.

Python Implementation: Enterprise Code Generation System

"""Microsoft AutoGen - Enterprise Code Generation System"""
import autogen
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from typing import Dict, Any, List, Optional
from dataclasses import dataclass
from enum import Enum
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class CodeQuality(Enum):
    DRAFT = "draft"
    REVIEWED = "reviewed"
    TESTED = "tested"
    PRODUCTION = "production"

@dataclass
class CodeGenerationConfig:
    language: str = "python"
    framework: Optional[str] = None
    style_guide: str = "PEP8"
    max_iterations: int = 5
    require_tests: bool = True
    require_docs: bool = True

class CodeGenerationPipeline:
    """Enterprise-grade code generation with AutoGen."""
    
    def __init__(self, llm_config: Dict[str, Any], config: CodeGenerationConfig):
        self.llm_config = llm_config
        self.config = config
        self.agents = {}
        self._setup_agents()
    
    def _setup_agents(self):
        """Initialize specialized code generation agents."""
        
        # Planning Agent
        self.agents["architect"] = AssistantAgent(
            name="Architect",
            system_message=f"""You are a software architect.
            
            Your responsibilities:
            1. Analyze requirements and identify components
            2. Design clean, modular architecture
            3. Define interfaces between components
            4. Consider scalability and maintainability
            
            Language: {self.config.language}
            Framework: {self.config.framework or 'None specified'}
            
            Provide clear design specifications for the developer.""",
            llm_config=self.llm_config,
        )
        
        # Developer Agent
        self.agents["developer"] = AssistantAgent(
            name="Developer",
            system_message=f"""You are an expert {self.config.language} developer.
            
            Your responsibilities:
            1. Implement code based on architect's design
            2. Follow {self.config.style_guide} style guide
            3. Include comprehensive error handling
            4. Add type hints and docstrings
            5. Write clean, maintainable code
            
            Always wrap code in proper code blocks.
            After implementation, say IMPLEMENTATION_COMPLETE.""",
            llm_config=self.llm_config,
        )
        
        # Code Reviewer Agent
        self.agents["reviewer"] = AssistantAgent(
            name="Reviewer",
            system_message="""You are a senior code reviewer.
            
            Your responsibilities:
            1. Review code for bugs and logic errors
            2. Check security vulnerabilities
            3. Verify coding standards compliance
            4. Suggest performance improvements
            5. Ensure proper error handling
            
            Provide specific, actionable feedback.
            If code passes review, say REVIEW_APPROVED.""",
            llm_config=self.llm_config,
        )
        
        # Test Engineer Agent
        self.agents["tester"] = AssistantAgent(
            name="Tester",
            system_message=f"""You are a test engineer.
            
            Your responsibilities:
            1. Write comprehensive unit tests
            2. Test edge cases and error conditions
            3. Verify expected behavior
            4. Check input validation
            
            Use pytest for {self.config.language}.
            After writing tests, say TESTS_COMPLETE.""",
            llm_config=self.llm_config,
        )
        
        # Executor Agent
        self.agents["executor"] = UserProxyAgent(
            name="Executor",
            human_input_mode="NEVER",
            max_consecutive_auto_reply=self.config.max_iterations,
            code_execution_config={
                "work_dir": "generated_code",
                "use_docker": True,
                "timeout": 60,
                "last_n_messages": 3,
            },
            is_termination_msg=lambda x: any(
                term in x.get("content", "")
                for term in ["TERMINATE", "PIPELINE_COMPLETE"]
            ),
        )
    
    def create_pipeline(self) -> tuple[GroupChat, GroupChatManager]:
        """Create the code generation pipeline."""
        
        agents = [
            self.agents["executor"],
            self.agents["architect"],
            self.agents["developer"],
            self.agents["reviewer"],
        ]
        
        if self.config.require_tests:
            agents.append(self.agents["tester"])
        
        group_chat = GroupChat(
            agents=agents,
            messages=[],
            max_round=30,
            speaker_selection_method="auto",
        )
        
        manager = GroupChatManager(
            groupchat=group_chat,
            llm_config=self.llm_config,
        )
        
        return group_chat, manager
    
    def generate(self, requirements: str) -> Dict[str, Any]:
        """Generate code from requirements."""
        
        _, manager = self.create_pipeline()
        
        prompt = f"""
        Generate production-quality code for the following requirements:
        
        {requirements}
        
        Process:
        1. Architect: Design the solution
        2. Developer: Implement the code
        3. Reviewer: Review for quality
        4. Tester: Write and run tests (if required)
        
        When all steps complete successfully, say PIPELINE_COMPLETE.
        """
        
        result = self.agents["executor"].initiate_chat(
            manager,
            message=prompt
        )
        
        return {
            "success": "PIPELINE_COMPLETE" in str(result),
            "messages": result.chat_history if hasattr(result, "chat_history") else [],
        }

# Example usage
def main():
    llm_config = {
        "config_list": [{"model": "gpt-4", "api_key": "your-key"}],
        "temperature": 0.3,
    }
    
    config = CodeGenerationConfig(
        language="python",
        framework="FastAPI",
        require_tests=True,
        require_docs=True,
    )
    
    pipeline = CodeGenerationPipeline(llm_config, config)
    
    requirements = """
    Create a REST API endpoint for user authentication:
    - POST /auth/login: Accept email/password, return JWT token
    - POST /auth/register: Create new user with email validation
    - Include password hashing with bcrypt
    - Add rate limiting (5 attempts per minute)
    """
    
    result = pipeline.generate(requirements)
    print(f"Generation {'succeeded' if result['success'] else 'failed'}")

if __name__ == "__main__":
    main()

✅ BEST PRACTICE

Set timeout limits on code execution to prevent infinite loops. Configure use_docker=True for isolated, secure execution.

Conclusion

Code generation represents one of the most impactful applications of multi-agent AI. By combining specialized agents for architecture, development, review, and testing, we can automate significant portions of the development workflow while maintaining high quality standards.

📌 Key Takeaways

Code generation pipelines combine planning, development, review, and testing agents
Docker isolation is essential for safe code execution
Iterative refinement handles execution errors automatically
Specialized agents mirror human development team roles
Prompt engineering significantly impacts code quality

🔜 Coming Up: Part 4: RAG Integration

We’ll enhance our agents with Retrieval-Augmented Generation to ground responses in factual, domain-specific knowledge.

← Part 2 Part 4 →

Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Searching in