GenAI Frameworks Compared: Building Real Applications with LangChain and LlamaIndex

You’ve got the fundamentals. You can call an LLM API and get responses. Now what?

Raw API calls work for simple use cases, but real applications need more: document retrieval, conversation memory, structured outputs, error handling, tool use. That’s where frameworks come in.

I’ve built production systems with all the major frameworks. Here’s an honest assessment of each—what they’re good at, where they fall short, and when to use them.

Series Navigation: Part 1: GenAI Intro → Part 2: LLMs → Part 3: Frameworks (You are here) → Part 4: Agentic AI → Part 5: Building Agents → Part 6: Enterprise

The Framework Landscape (Mid-2025)

The ecosystem has matured significantly. Here’s the current state:

Framework	Primary Strength	Best For	Learning Curve
LangChain	Comprehensive toolkit	Complex chains, agents, broad integrations	Moderate-High
LlamaIndex	Data/document handling	RAG systems, knowledge bases	Low-Moderate
Semantic Kernel	Enterprise .NET/Python	Enterprise apps, Microsoft stack	Moderate
Haystack	Production pipelines	Search, Q&A systems	Moderate
LiteLLM	Model abstraction	Multi-model applications	Low

LangChain: The Swiss Army Knife

LangChain is the most popular framework, and for good reason—it has everything. But that “everything” can also be a curse.

When to Use LangChain

Building complex multi-step chains
Need broad integrations (vector stores, tools, APIs)
Building agents with tool use
Prototyping—it’s fast to get something working

When to Avoid

Simple use cases (overkill, adds complexity)
When you need fine-grained control over every API call
Performance-critical paths (abstraction overhead)

# LangChain basics - conversational chain with memory
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

# Initialize model
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

# Memory - keeps last 5 exchanges
memory = ConversationBufferWindowMemory(k=5)

# Create conversational chain
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True  # Shows chain execution
)

# Have a conversation
response1 = conversation.predict(input="Hi, I'm Alex. I'm building a fintech app.")
print(response1)

response2 = conversation.predict(input="What database should I use?")
print(response2)  # Will remember you're building a fintech app

response3 = conversation.predict(input="What was my name again?")
print(response3)  # Will remember you're Alex

LangChain RAG Implementation

# Production-ready RAG with LangChain
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader

def create_rag_system(pdf_paths: list[str]):
    """Create a RAG system from PDF documents."""
    
    # 1. Load documents
    documents = []
    for path in pdf_paths:
        loader = PyPDFLoader(path)
        documents.extend(loader.load())
    
    # 2. Split into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        separators=["\n\n", "\n", " ", ""]
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Split into {len(chunks)} chunks")
    
    # 3. Create embeddings and vector store
    embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory="./chroma_db"
    )
    
    # 4. Create retrieval chain
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",  # or "map_reduce" for large docs
        retriever=vectorstore.as_retriever(
            search_type="mmr",  # Maximum Marginal Relevance
            search_kwargs={"k": 5}
        ),
        return_source_documents=True
    )
    
    return qa_chain

# Usage
qa = create_rag_system(["company_policies.pdf", "employee_handbook.pdf"])
result = qa.invoke({"query": "What is the vacation policy?"})
print(result["result"])
print(f"Sources: {[doc.metadata for doc in result['source_documents']]}")

LlamaIndex: The Data Specialist

LlamaIndex (formerly GPT Index) excels at one thing: connecting LLMs with your data. If your primary use case is RAG or building knowledge bases, LlamaIndex is often simpler than LangChain.

When to Use LlamaIndex

Document Q&A systems
Knowledge bases over structured/unstructured data
When data quality and retrieval matter most
Multi-document synthesis

# LlamaIndex - cleaner RAG implementation
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure defaults
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# Create index - handles chunking and embedding automatically
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings in the Q3 report?")
print(response)

Advanced LlamaIndex: Multi-Document Agents

# Document agents - one agent per document with a router
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAgent

def create_document_agent(doc_path: str, doc_description: str):
    """Create an agent for a single document with multiple query modes."""
    
    # Load single document
    documents = SimpleDirectoryReader(input_files=[doc_path]).load_data()
    
    # Vector index for specific questions
    vector_index = VectorStoreIndex.from_documents(documents)
    
    # Summary index for high-level questions
    summary_index = SummaryIndex.from_documents(documents)
    
    # Create tools for different query types
    vector_tool = QueryEngineTool(
        query_engine=vector_index.as_query_engine(),
        metadata=ToolMetadata(
            name="vector_search",
            description=f"Search for specific details in {doc_description}"
        )
    )
    
    summary_tool = QueryEngineTool(
        query_engine=summary_index.as_query_engine(),
        metadata=ToolMetadata(
            name="summary",
            description=f"Get a summary or high-level overview of {doc_description}"
        )
    )
    
    return [vector_tool, summary_tool]

# Create agents for multiple documents
tools = []
tools.extend(create_document_agent("financials_2024.pdf", "2024 Financial Report"))
tools.extend(create_document_agent("strategy_doc.pdf", "Company Strategy Document"))

# Router agent picks the right tool
agent = OpenAIAgent.from_tools(tools, verbose=True)

# Now it can answer questions across documents
response = agent.chat("Compare the 2024 revenue with the strategy goals")

Semantic Kernel: Enterprise & Microsoft Stack

Microsoft’s Semantic Kernel is the choice if you’re in a .NET shop or building on Azure. It’s also excellent for Python with a focus on enterprise patterns.

# Semantic Kernel - plugins and planners
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.core_plugins import TextPlugin, TimePlugin

# Initialize kernel
kernel = sk.Kernel()

# Add AI service
kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt4",
        ai_model_id="gpt-4o"
    )
)

# Add built-in plugins
kernel.add_plugin(TextPlugin(), "text")
kernel.add_plugin(TimePlugin(), "time")

# Create a custom function (plugin)
summarize_prompt = """
Summarize the following text in {{$style}} style:

{{$input}}

Summary:
"""

summarize_function = kernel.add_function(
    plugin_name="writer",
    function_name="summarize",
    prompt=summarize_prompt
)

# Execute
result = await kernel.invoke(
    summarize_function,
    input="Long article text here...",
    style="executive brief"
)
print(result)

Building RAG Right: Practical Patterns

Regardless of framework, RAG systems share common challenges. Here’s what works:

Chunking Strategy

# Smart chunking - respect document structure
from langchain.text_splitter import RecursiveCharacterTextSplitter

# For general text
general_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""]
)

# For code
code_splitter = RecursiveCharacterTextSplitter.from_language(
    language="python",
    chunk_size=1500,
    chunk_overlap=200
)

# For markdown (respects headers)
from langchain.text_splitter import MarkdownHeaderTextSplitter
headers_to_split = [
    ("#", "h1"),
    ("##", "h2"),
    ("###", "h3"),
]
md_splitter = MarkdownHeaderTextSplitter(headers_to_split)

Hybrid Search (Vector + Keyword)

# Hybrid search combines semantic and keyword matching
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain_community.vectorstores import Chroma

# Vector retriever (semantic)
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# BM25 retriever (keyword - good for exact matches)
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 5

# Ensemble - combines both with weights
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.4, 0.6]  # Favor semantic but include keyword matches
)

Reranking for Better Results

# Reranking improves retrieval quality significantly
from langchain.retrievers import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

# Base retriever
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

# Cohere reranker
reranker = CohereRerank(model="rerank-english-v3.0", top_n=5)

# Compressed retriever - retrieves 20, reranks to top 5
compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

Framework Selection Guide

Quick Decision Framework

Just need RAG? → LlamaIndex

Building agents with tools? → LangChain or LlamaIndex

.NET shop or Azure-first? → Semantic Kernel

Multi-model support needed? → LiteLLM + any framework

Production search system? → Haystack

Simple use case? → Raw API calls, skip the framework

LiteLLM: The Universal Adapter

LiteLLM isn’t a full framework—it’s a unified interface to 100+ LLM providers. Incredibly useful for model flexibility:

# LiteLLM - same interface for any model
from litellm import completion

# OpenAI
response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# Anthropic - same interface
response = completion(
    model="claude-4-sonnet",
    messages=[{"role": "user", "content": "Hello"}]
)

# Google Gemini - same interface  
response = completion(
    model="gemini/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello"}]
)

# Azure OpenAI
response = completion(
    model="azure/gpt-4o-deployment",
    messages=[{"role": "user", "content": "Hello"}]
)

# Self-hosted Llama via Ollama
response = completion(
    model="ollama/llama4",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="http://localhost:11434"
)

Key Takeaways

Don’t over-engineer: Start simple. Use frameworks when you actually need them.

LlamaIndex for data: If your core problem is connecting LLMs to documents, start here.

LangChain for complexity: When you need chains, agents, and lots of integrations.

Retrieval quality matters most: Chunking, hybrid search, and reranking beat fancier models.

LiteLLM for flexibility: Add it as your LLM layer for easy model switching.

What’s Next

In Part 4, we’ll explore Agentic AI—moving beyond simple chains to autonomous agents that can plan, use tools, and accomplish complex tasks. This is where things get really interesting.

References & Further Reading

LangChain Documentation – python.langchain.com
LlamaIndex Documentation – docs.llamaindex.ai
Semantic Kernel – learn.microsoft.com
LiteLLM – docs.litellm.ai
RAG Best Practices – Pinecone Guide

What frameworks are you using? Share your experiences on GitHub or in the comments.

Discover more from Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Searching in

GenAI Frameworks Compared: Building Real Applications with LangChain and LlamaIndex

The Framework Landscape (Mid-2025)

LangChain: The Swiss Army Knife

When to Use LangChain

When to Avoid

LangChain RAG Implementation

LlamaIndex: The Data Specialist

When to Use LlamaIndex

Advanced LlamaIndex: Multi-Document Agents

Semantic Kernel: Enterprise & Microsoft Stack

Building RAG Right: Practical Patterns

Chunking Strategy

Hybrid Search (Vector + Keyword)

Reranking for Better Results

Framework Selection Guide

Quick Decision Framework

LiteLLM: The Universal Adapter

Key Takeaways

What’s Next

References & Further Reading

Discover more from Code, Cloud & Context

Leave a Reply

Searching in

The Framework Landscape (Mid-2025)

LangChain: The Swiss Army Knife

When to Use LangChain

When to Avoid

LangChain RAG Implementation

LlamaIndex: The Data Specialist

When to Use LlamaIndex

Advanced LlamaIndex: Multi-Document Agents

Semantic Kernel: Enterprise & Microsoft Stack

Building RAG Right: Practical Patterns

Chunking Strategy

Hybrid Search (Vector + Keyword)

Reranking for Better Results

Framework Selection Guide

Quick Decision Framework

LiteLLM: The Universal Adapter

Key Takeaways

What’s Next

References & Further Reading

Share this article

Discover more from Code, Cloud & Context

Leave a Reply