AI Deployment – C4: Container, Code, Cloud & Context

Production Model Deployment Patterns: From REST APIs to Kubernetes Orchestration in Python

Posted on December 3, 2025 by Nithin Mohan TK 8 min read

After deploying hundreds of ML models to production across startups and enterprises, I’ve learned that model deployment is where most AI projects fail. Not because the models don’t work—but because teams underestimate the engineering complexity of serving predictions reliably at scale. This article shares production-tested deployment patterns from REST APIs to Kubernetes orchestration. 1. The […]

Read more →

Production RAG Architecture: Building Scalable Vector Search Systems

Posted on March 14, 2025 by Nithin Mohan TK 4 min read

Three months into production, our RAG system started failing at 2AM. Not gracefully—complete outages. The problem wasn’t the models or the embeddings. It was the architecture. After rebuilding it twice, here’s what I learned about building RAG systems that actually work in production. Figure 1: Production RAG Architecture Overview The Night Everything Broke It was […]

Read more →

Running LLMs on Kubernetes: Production Deployment Guide

Posted on February 20, 2025 by Nithin Mohan TK 7 min read

Deploying LLMs on Kubernetes requires careful planning. After deploying 25+ LLM models on Kubernetes, I’ve learned what works. Here’s the complete guide to running LLMs on Kubernetes in production. Figure 1: Kubernetes LLM Architecture Why Kubernetes for LLMs Kubernetes offers significant advantages for LLM deployment: Scalability: Auto-scale based on demand Resource management: Efficient GPU and […]

Read more →

Deploying LLM Applications on Cloud Run: A Complete Guide

Posted on November 5, 2024 by Nithin Mohan TK 6 min read

Last year, I deployed our first LLM application to Cloud Run. What should have taken hours took three days. Cold starts killed our latency. Memory limits caused crashes. Timeouts broke long-running requests. After deploying 20+ LLM applications to Cloud Run, I’ve learned what works and what doesn’t. Here’s the complete guide. Figure 1: Cloud Run […]

Read more →

Searching in

Tag: AI Deployment

Production Model Deployment Patterns: From REST APIs to Kubernetes Orchestration in Python

Running LLMs on Kubernetes: Production Deployment Guide