GenAI Frameworks Compared: Building Real Applications with LangChain and LlamaIndex

You’ve got the fundamentals. You can call an LLM API and get responses. Now what?

Raw API calls work for simple use cases, but real applications need more: document retrieval, conversation memory, structured outputs, error handling, tool use. That’s where frameworks come in.

I’ve built production systems with all the major frameworks. Here’s an honest assessment of each—what they’re good at, where they fall short, and when to use them.

Series Navigation: Part 1: GenAI Intro → Part 2: LLMs → Part 3: Frameworks (You are here) → Part 4: Agentic AI → Part 5: Building Agents → Part 6: Enterprise

The Framework Landscape (Mid-2025)

The ecosystem has matured significantly. Here’s the current state:

Framework Primary Strength Best For Learning Curve
LangChain Comprehensive toolkit Complex chains, agents, broad integrations Moderate-High
LlamaIndex Data/document handling RAG systems, knowledge bases Low-Moderate
Semantic Kernel Enterprise .NET/Python Enterprise apps, Microsoft stack Moderate
Haystack Production pipelines Search, Q&A systems Moderate
LiteLLM Model abstraction Multi-model applications Low

LangChain: The Swiss Army Knife

LangChain is the most popular framework, and for good reason—it has everything. But that “everything” can also be a curse.

When to Use LangChain

  • Building complex multi-step chains
  • Need broad integrations (vector stores, tools, APIs)
  • Building agents with tool use
  • Prototyping—it’s fast to get something working

When to Avoid

  • Simple use cases (overkill, adds complexity)
  • When you need fine-grained control over every API call
  • Performance-critical paths (abstraction overhead)
# LangChain basics - conversational chain with memory
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

# Initialize model
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

# Memory - keeps last 5 exchanges
memory = ConversationBufferWindowMemory(k=5)

# Create conversational chain
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True  # Shows chain execution
)

# Have a conversation
response1 = conversation.predict(input="Hi, I'm Alex. I'm building a fintech app.")
print(response1)

response2 = conversation.predict(input="What database should I use?")
print(response2)  # Will remember you're building a fintech app

response3 = conversation.predict(input="What was my name again?")
print(response3)  # Will remember you're Alex

LangChain RAG Implementation

# Production-ready RAG with LangChain
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader

def create_rag_system(pdf_paths: list[str]):
    """Create a RAG system from PDF documents."""
    
    # 1. Load documents
    documents = []
    for path in pdf_paths:
        loader = PyPDFLoader(path)
        documents.extend(loader.load())
    
    # 2. Split into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        separators=["\n\n", "\n", " ", ""]
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Split into {len(chunks)} chunks")
    
    # 3. Create embeddings and vector store
    embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory="./chroma_db"
    )
    
    # 4. Create retrieval chain
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",  # or "map_reduce" for large docs
        retriever=vectorstore.as_retriever(
            search_type="mmr",  # Maximum Marginal Relevance
            search_kwargs={"k": 5}
        ),
        return_source_documents=True
    )
    
    return qa_chain

# Usage
qa = create_rag_system(["company_policies.pdf", "employee_handbook.pdf"])
result = qa.invoke({"query": "What is the vacation policy?"})
print(result["result"])
print(f"Sources: {[doc.metadata for doc in result['source_documents']]}")

LlamaIndex: The Data Specialist

LlamaIndex (formerly GPT Index) excels at one thing: connecting LLMs with your data. If your primary use case is RAG or building knowledge bases, LlamaIndex is often simpler than LangChain.

When to Use LlamaIndex

  • Document Q&A systems
  • Knowledge bases over structured/unstructured data
  • When data quality and retrieval matter most
  • Multi-document synthesis
# LlamaIndex - cleaner RAG implementation
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure defaults
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# Create index - handles chunking and embedding automatically
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings in the Q3 report?")
print(response)

Advanced LlamaIndex: Multi-Document Agents

# Document agents - one agent per document with a router
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAgent

def create_document_agent(doc_path: str, doc_description: str):
    """Create an agent for a single document with multiple query modes."""
    
    # Load single document
    documents = SimpleDirectoryReader(input_files=[doc_path]).load_data()
    
    # Vector index for specific questions
    vector_index = VectorStoreIndex.from_documents(documents)
    
    # Summary index for high-level questions
    summary_index = SummaryIndex.from_documents(documents)
    
    # Create tools for different query types
    vector_tool = QueryEngineTool(
        query_engine=vector_index.as_query_engine(),
        metadata=ToolMetadata(
            name="vector_search",
            description=f"Search for specific details in {doc_description}"
        )
    )
    
    summary_tool = QueryEngineTool(
        query_engine=summary_index.as_query_engine(),
        metadata=ToolMetadata(
            name="summary",
            description=f"Get a summary or high-level overview of {doc_description}"
        )
    )
    
    return [vector_tool, summary_tool]

# Create agents for multiple documents
tools = []
tools.extend(create_document_agent("financials_2024.pdf", "2024 Financial Report"))
tools.extend(create_document_agent("strategy_doc.pdf", "Company Strategy Document"))

# Router agent picks the right tool
agent = OpenAIAgent.from_tools(tools, verbose=True)

# Now it can answer questions across documents
response = agent.chat("Compare the 2024 revenue with the strategy goals")

Semantic Kernel: Enterprise & Microsoft Stack

Microsoft’s Semantic Kernel is the choice if you’re in a .NET shop or building on Azure. It’s also excellent for Python with a focus on enterprise patterns.

# Semantic Kernel - plugins and planners
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.core_plugins import TextPlugin, TimePlugin

# Initialize kernel
kernel = sk.Kernel()

# Add AI service
kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt4",
        ai_model_id="gpt-4o"
    )
)

# Add built-in plugins
kernel.add_plugin(TextPlugin(), "text")
kernel.add_plugin(TimePlugin(), "time")

# Create a custom function (plugin)
summarize_prompt = """
Summarize the following text in {{$style}} style:

{{$input}}

Summary:
"""

summarize_function = kernel.add_function(
    plugin_name="writer",
    function_name="summarize",
    prompt=summarize_prompt
)

# Execute
result = await kernel.invoke(
    summarize_function,
    input="Long article text here...",
    style="executive brief"
)
print(result)

Building RAG Right: Practical Patterns

Regardless of framework, RAG systems share common challenges. Here’s what works:

Chunking Strategy

# Smart chunking - respect document structure
from langchain.text_splitter import RecursiveCharacterTextSplitter

# For general text
general_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""]
)

# For code
code_splitter = RecursiveCharacterTextSplitter.from_language(
    language="python",
    chunk_size=1500,
    chunk_overlap=200
)

# For markdown (respects headers)
from langchain.text_splitter import MarkdownHeaderTextSplitter
headers_to_split = [
    ("#", "h1"),
    ("##", "h2"),
    ("###", "h3"),
]
md_splitter = MarkdownHeaderTextSplitter(headers_to_split)

Hybrid Search (Vector + Keyword)

# Hybrid search combines semantic and keyword matching
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain_community.vectorstores import Chroma

# Vector retriever (semantic)
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# BM25 retriever (keyword - good for exact matches)
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 5

# Ensemble - combines both with weights
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.4, 0.6]  # Favor semantic but include keyword matches
)

Reranking for Better Results

# Reranking improves retrieval quality significantly
from langchain.retrievers import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

# Base retriever
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

# Cohere reranker
reranker = CohereRerank(model="rerank-english-v3.0", top_n=5)

# Compressed retriever - retrieves 20, reranks to top 5
compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

Framework Selection Guide

Quick Decision Framework

  • Just need RAG? → LlamaIndex
  • Building agents with tools? → LangChain or LlamaIndex
  • .NET shop or Azure-first? → Semantic Kernel
  • Multi-model support needed? → LiteLLM + any framework
  • Production search system? → Haystack
  • Simple use case? → Raw API calls, skip the framework

LiteLLM: The Universal Adapter

LiteLLM isn’t a full framework—it’s a unified interface to 100+ LLM providers. Incredibly useful for model flexibility:

# LiteLLM - same interface for any model
from litellm import completion

# OpenAI
response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# Anthropic - same interface
response = completion(
    model="claude-4-sonnet",
    messages=[{"role": "user", "content": "Hello"}]
)

# Google Gemini - same interface  
response = completion(
    model="gemini/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello"}]
)

# Azure OpenAI
response = completion(
    model="azure/gpt-4o-deployment",
    messages=[{"role": "user", "content": "Hello"}]
)

# Self-hosted Llama via Ollama
response = completion(
    model="ollama/llama4",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="http://localhost:11434"
)

Key Takeaways

  • Don’t over-engineer: Start simple. Use frameworks when you actually need them.
  • LlamaIndex for data: If your core problem is connecting LLMs to documents, start here.
  • LangChain for complexity: When you need chains, agents, and lots of integrations.
  • Retrieval quality matters most: Chunking, hybrid search, and reranking beat fancier models.
  • LiteLLM for flexibility: Add it as your LLM layer for easy model switching.

What’s Next

In Part 4, we’ll explore Agentic AI—moving beyond simple chains to autonomous agents that can plan, use tools, and accomplish complex tasks. This is where things get really interesting.


References & Further Reading

What frameworks are you using? Share your experiences on GitHub or in the comments.


Discover more from Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.