Emerging Technologies – Page 32 – C4: Container, Code, Cloud & Context

LlamaIndex: The Data Framework for Building Production RAG Applications

Posted on April 15, 2025 by Nithin Mohan TK 8 min read

Introduction: LlamaIndex (formerly GPT Index) is the leading data framework for building LLM applications over your private data. While LangChain focuses on chains and agents, LlamaIndex specializes in data ingestion, indexing, and retrieval—the core components of Retrieval Augmented Generation (RAG). With over 160 data connectors through LlamaHub, sophisticated indexing strategies, and production-ready query engines, LlamaIndex […]

Read more →

Function Calling Deep Dive: Building LLM-Powered Tools and Agents

Posted on April 15, 2025 by Nithin Mohan TK 9 min read

Introduction: Function calling transforms LLMs from text generators into action-taking agents. Instead of just describing what to do, the model can actually do it—query databases, call APIs, execute code, and interact with external systems. OpenAI’s function calling (now called “tools”) and similar features from Anthropic and others let you define available functions, and the model […]

Read more →

RESTful AI API Design: Best Practices for LLM APIs

Posted on April 15, 2025 by Nithin Mohan TK 13 min read

Designing RESTful APIs for LLMs requires careful consideration. After building 30+ LLM APIs, I’ve learned what works. Here’s the complete guide to RESTful AI API design. Figure 1: RESTful AI API Architecture Why LLM APIs Are Different LLM APIs have unique requirements: Async operations: LLM inference can take seconds or minutes Streaming responses: Need to […]

Read more →

LLM Rate Limiting: Maximizing API Throughput Without Getting Throttled

Posted on April 13, 2025 by Nithin Mohan TK 16 min read

Introduction: LLM APIs have strict rate limits—requests per minute, tokens per minute, and concurrent request limits. Hit these limits and your application grinds to a halt with 429 errors. Effective rate limiting isn’t just about staying under limits; it’s about maximizing throughput while maintaining reliability. This guide covers practical rate limiting patterns: token bucket algorithms […]

Read more →

Azure Application Gateway: A Solutions Architect’s Guide to Regional Load Balancing and WAF

Posted on April 13, 2025 by Nithin Mohan TK 6 min read

While Azure Front Door excels at global load balancing, many enterprise scenarios require regional application delivery with deep integration into virtual network architectures. Azure Application Gateway fills this niche perfectly, providing Layer 7 load balancing with integrated Web Application Firewall capabilities within a single Azure region. Having architected countless regional application delivery solutions over my […]

Read more →

Getting Started with React and ViteJS: Enterprise-Grade Frontend Scaffolding Guide

Posted on April 12, 2025 by Nithin Mohan TK 8 min read

Building modern React applications shouldn’t feel like wrestling with complex toolchains. Vite has fundamentally changed how we approach frontend development, offering lightning-fast builds and an exceptional developer experience that enterprise teams are increasingly adopting. Introduction This guide walks you through setting up a production-ready React application using Vite as your build tool. We’ll cover project […]

Read more →

Searching in

Category: Emerging Technologies

LlamaIndex: The Data Framework for Building Production RAG Applications

Function Calling Deep Dive: Building LLM-Powered Tools and Agents

RESTful AI API Design: Best Practices for LLM APIs

LLM Rate Limiting: Maximizing API Throughput Without Getting Throttled

Azure Application Gateway: A Solutions Architect’s Guide to Regional Load Balancing and WAF

Getting Started with React and ViteJS: Enterprise-Grade Frontend Scaffolding Guide