You’ve got the fundamentals. You can call an LLM API and get responses. Now what?
Raw API calls work for simple use cases, but real applications need more: document retrieval, conversation memory, structured outputs, error handling, tool use. That’s where frameworks come in.
I’ve built production systems with all the major frameworks. Here’s an honest assessment of each—what they’re good at, where they fall short, and when to use them.
Series Navigation: Part 1: GenAI Intro → Part 2: LLMs → Part 3: Frameworks (You are here) → Part 4: Agentic AI → Part 5: Building Agents → Part 6: Enterprise
The Framework Landscape (Mid-2025)
The ecosystem has matured significantly. Here’s the current state:
| Framework | Primary Strength | Best For | Learning Curve |
|---|---|---|---|
| LangChain | Comprehensive toolkit | Complex chains, agents, broad integrations | Moderate-High |
| LlamaIndex | Data/document handling | RAG systems, knowledge bases | Low-Moderate |
| Semantic Kernel | Enterprise .NET/Python | Enterprise apps, Microsoft stack | Moderate |
| Haystack | Production pipelines | Search, Q&A systems | Moderate |
| LiteLLM | Model abstraction | Multi-model applications | Low |
LangChain: The Swiss Army Knife
LangChain is the most popular framework, and for good reason—it has everything. But that “everything” can also be a curse.
When to Use LangChain
- Building complex multi-step chains
- Need broad integrations (vector stores, tools, APIs)
- Building agents with tool use
- Prototyping—it’s fast to get something working
When to Avoid
- Simple use cases (overkill, adds complexity)
- When you need fine-grained control over every API call
- Performance-critical paths (abstraction overhead)
# LangChain basics - conversational chain with memory
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain
# Initialize model
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
# Memory - keeps last 5 exchanges
memory = ConversationBufferWindowMemory(k=5)
# Create conversational chain
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True # Shows chain execution
)
# Have a conversation
response1 = conversation.predict(input="Hi, I'm Alex. I'm building a fintech app.")
print(response1)
response2 = conversation.predict(input="What database should I use?")
print(response2) # Will remember you're building a fintech app
response3 = conversation.predict(input="What was my name again?")
print(response3) # Will remember you're Alex
LangChain RAG Implementation
# Production-ready RAG with LangChain
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
def create_rag_system(pdf_paths: list[str]):
"""Create a RAG system from PDF documents."""
# 1. Load documents
documents = []
for path in pdf_paths:
loader = PyPDFLoader(path)
documents.extend(loader.load())
# 2. Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
chunks = text_splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")
# 3. Create embeddings and vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
# 4. Create retrieval chain
llm = ChatOpenAI(model="gpt-4o", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # or "map_reduce" for large docs
retriever=vectorstore.as_retriever(
search_type="mmr", # Maximum Marginal Relevance
search_kwargs={"k": 5}
),
return_source_documents=True
)
return qa_chain
# Usage
qa = create_rag_system(["company_policies.pdf", "employee_handbook.pdf"])
result = qa.invoke({"query": "What is the vacation policy?"})
print(result["result"])
print(f"Sources: {[doc.metadata for doc in result['source_documents']]}")
LlamaIndex: The Data Specialist
LlamaIndex (formerly GPT Index) excels at one thing: connecting LLMs with your data. If your primary use case is RAG or building knowledge bases, LlamaIndex is often simpler than LangChain.
When to Use LlamaIndex
- Document Q&A systems
- Knowledge bases over structured/unstructured data
- When data quality and retrieval matter most
- Multi-document synthesis
# LlamaIndex - cleaner RAG implementation
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Configure defaults
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()
# Create index - handles chunking and embedding automatically
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings in the Q3 report?")
print(response)
Advanced LlamaIndex: Multi-Document Agents
# Document agents - one agent per document with a router
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAgent
def create_document_agent(doc_path: str, doc_description: str):
"""Create an agent for a single document with multiple query modes."""
# Load single document
documents = SimpleDirectoryReader(input_files=[doc_path]).load_data()
# Vector index for specific questions
vector_index = VectorStoreIndex.from_documents(documents)
# Summary index for high-level questions
summary_index = SummaryIndex.from_documents(documents)
# Create tools for different query types
vector_tool = QueryEngineTool(
query_engine=vector_index.as_query_engine(),
metadata=ToolMetadata(
name="vector_search",
description=f"Search for specific details in {doc_description}"
)
)
summary_tool = QueryEngineTool(
query_engine=summary_index.as_query_engine(),
metadata=ToolMetadata(
name="summary",
description=f"Get a summary or high-level overview of {doc_description}"
)
)
return [vector_tool, summary_tool]
# Create agents for multiple documents
tools = []
tools.extend(create_document_agent("financials_2024.pdf", "2024 Financial Report"))
tools.extend(create_document_agent("strategy_doc.pdf", "Company Strategy Document"))
# Router agent picks the right tool
agent = OpenAIAgent.from_tools(tools, verbose=True)
# Now it can answer questions across documents
response = agent.chat("Compare the 2024 revenue with the strategy goals")
Semantic Kernel: Enterprise & Microsoft Stack
Microsoft’s Semantic Kernel is the choice if you’re in a .NET shop or building on Azure. It’s also excellent for Python with a focus on enterprise patterns.
# Semantic Kernel - plugins and planners
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.core_plugins import TextPlugin, TimePlugin
# Initialize kernel
kernel = sk.Kernel()
# Add AI service
kernel.add_service(
OpenAIChatCompletion(
service_id="gpt4",
ai_model_id="gpt-4o"
)
)
# Add built-in plugins
kernel.add_plugin(TextPlugin(), "text")
kernel.add_plugin(TimePlugin(), "time")
# Create a custom function (plugin)
summarize_prompt = """
Summarize the following text in {{$style}} style:
{{$input}}
Summary:
"""
summarize_function = kernel.add_function(
plugin_name="writer",
function_name="summarize",
prompt=summarize_prompt
)
# Execute
result = await kernel.invoke(
summarize_function,
input="Long article text here...",
style="executive brief"
)
print(result)
Building RAG Right: Practical Patterns
Regardless of framework, RAG systems share common challenges. Here’s what works:
Chunking Strategy
# Smart chunking - respect document structure
from langchain.text_splitter import RecursiveCharacterTextSplitter
# For general text
general_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""]
)
# For code
code_splitter = RecursiveCharacterTextSplitter.from_language(
language="python",
chunk_size=1500,
chunk_overlap=200
)
# For markdown (respects headers)
from langchain.text_splitter import MarkdownHeaderTextSplitter
headers_to_split = [
("#", "h1"),
("##", "h2"),
("###", "h3"),
]
md_splitter = MarkdownHeaderTextSplitter(headers_to_split)
Hybrid Search (Vector + Keyword)
# Hybrid search combines semantic and keyword matching
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain_community.vectorstores import Chroma
# Vector retriever (semantic)
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
# BM25 retriever (keyword - good for exact matches)
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 5
# Ensemble - combines both with weights
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.4, 0.6] # Favor semantic but include keyword matches
)
Reranking for Better Results
# Reranking improves retrieval quality significantly
from langchain.retrievers import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
# Base retriever
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})
# Cohere reranker
reranker = CohereRerank(model="rerank-english-v3.0", top_n=5)
# Compressed retriever - retrieves 20, reranks to top 5
compression_retriever = ContextualCompressionRetriever(
base_compressor=reranker,
base_retriever=base_retriever
)
Framework Selection Guide
Quick Decision Framework
- Just need RAG? → LlamaIndex
- Building agents with tools? → LangChain or LlamaIndex
- .NET shop or Azure-first? → Semantic Kernel
- Multi-model support needed? → LiteLLM + any framework
- Production search system? → Haystack
- Simple use case? → Raw API calls, skip the framework
LiteLLM: The Universal Adapter
LiteLLM isn’t a full framework—it’s a unified interface to 100+ LLM providers. Incredibly useful for model flexibility:
# LiteLLM - same interface for any model
from litellm import completion
# OpenAI
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
# Anthropic - same interface
response = completion(
model="claude-4-sonnet",
messages=[{"role": "user", "content": "Hello"}]
)
# Google Gemini - same interface
response = completion(
model="gemini/gemini-2.5-pro",
messages=[{"role": "user", "content": "Hello"}]
)
# Azure OpenAI
response = completion(
model="azure/gpt-4o-deployment",
messages=[{"role": "user", "content": "Hello"}]
)
# Self-hosted Llama via Ollama
response = completion(
model="ollama/llama4",
messages=[{"role": "user", "content": "Hello"}],
api_base="http://localhost:11434"
)
Key Takeaways
- Don’t over-engineer: Start simple. Use frameworks when you actually need them.
- LlamaIndex for data: If your core problem is connecting LLMs to documents, start here.
- LangChain for complexity: When you need chains, agents, and lots of integrations.
- Retrieval quality matters most: Chunking, hybrid search, and reranking beat fancier models.
- LiteLLM for flexibility: Add it as your LLM layer for easy model switching.
What’s Next
In Part 4, we’ll explore Agentic AI—moving beyond simple chains to autonomous agents that can plan, use tools, and accomplish complex tasks. This is where things get really interesting.
References & Further Reading
- LangChain Documentation – python.langchain.com
- LlamaIndex Documentation – docs.llamaindex.ai
- Semantic Kernel – learn.microsoft.com
- LiteLLM – docs.litellm.ai
- RAG Best Practices – Pinecone Guide
What frameworks are you using? Share your experiences on GitHub or in the comments.
Discover more from Code, Cloud & Context
Subscribe to get the latest posts sent to your email.