Introduction: Chain-of-thought (CoT) prompting dramatically improves LLM performance on complex reasoning tasks. Instead of asking for a direct answer, you prompt the model to show its reasoning step by step. This simple technique can boost accuracy on math problems from 17% to 78%, and similar gains appear across logical reasoning, code generation, and multi-step analysis. But CoT isn’t magic—it works best for certain problem types and can actually hurt performance on simple tasks. This guide covers practical chain-of-thought techniques: zero-shot CoT with “let’s think step by step”, few-shot CoT with worked examples, self-consistency for improved reliability, and building CoT pipelines that know when to use reasoning chains and when to skip them.

Zero-Shot Chain-of-Thought
from dataclasses import dataclass
from typing import Any, Optional
@dataclass
class CoTResponse:
"""Response with chain-of-thought reasoning."""
reasoning: str
answer: str
confidence: float = 0.0
class ZeroShotCoT:
"""Zero-shot chain-of-thought prompting."""
def __init__(self, client: Any, model: str = "gpt-4o-mini"):
self.client = client
self.model = model
# Different CoT triggers for different tasks
self.triggers = {
"default": "Let's think step by step.",
"math": "Let's solve this step by step, showing all calculations.",
"logic": "Let's reason through this carefully, considering each possibility.",
"code": "Let's break this down into smaller parts and solve each one.",
"analysis": "Let's analyze this systematically, examining each factor."
}
async def solve(
self,
problem: str,
task_type: str = "default",
system_prompt: str = None
) -> CoTResponse:
"""Solve problem using zero-shot CoT."""
trigger = self.triggers.get(task_type, self.triggers["default"])
# First pass: generate reasoning
reasoning_prompt = f"""{problem}
{trigger}"""
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": reasoning_prompt})
response = await self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0
)
reasoning = response.choices[0].message.content
# Second pass: extract final answer
extraction_prompt = f"""Based on the following reasoning, what is the final answer?
Provide only the answer, nothing else.
Reasoning:
{reasoning}
Final answer:"""
answer_response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": extraction_prompt}],
temperature=0,
max_tokens=100
)
answer = answer_response.choices[0].message.content.strip()
return CoTResponse(
reasoning=reasoning,
answer=answer
)
async def solve_with_verification(
self,
problem: str,
task_type: str = "default"
) -> CoTResponse:
"""Solve with self-verification step."""
# Get initial solution
initial = await self.solve(problem, task_type)
# Verify the solution
verification_prompt = f"""Problem: {problem}
Proposed solution:
{initial.reasoning}
Answer: {initial.answer}
Please verify this solution. Is the reasoning correct? Is the answer correct?
If there are any errors, provide the corrected reasoning and answer.
If correct, confirm the answer."""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": verification_prompt}],
temperature=0
)
verification = response.choices[0].message.content
# Check if verification found errors
if "error" in verification.lower() or "incorrect" in verification.lower():
# Extract corrected answer
return CoTResponse(
reasoning=f"{initial.reasoning}\n\nVerification:\n{verification}",
answer=self._extract_answer(verification),
confidence=0.7
)
return CoTResponse(
reasoning=initial.reasoning,
answer=initial.answer,
confidence=0.9
)
def _extract_answer(self, text: str) -> str:
"""Extract answer from verification text."""
# Look for common answer patterns
import re
patterns = [
r"correct answer is[:\s]+(.+?)(?:\.|$)",
r"answer should be[:\s]+(.+?)(?:\.|$)",
r"final answer[:\s]+(.+?)(?:\.|$)"
]
for pattern in patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
return match.group(1).strip()
return text.split("\n")[-1].strip()
Few-Shot Chain-of-Thought
from dataclasses import dataclass
from typing import Any, Optional
@dataclass
class CoTExample:
"""A chain-of-thought example."""
problem: str
reasoning: str
answer: str
class FewShotCoT:
"""Few-shot chain-of-thought with worked examples."""
def __init__(self, client: Any, model: str = "gpt-4o-mini"):
self.client = client
self.model = model
self.examples: dict[str, list[CoTExample]] = {}
self._load_default_examples()
def _load_default_examples(self):
"""Load default CoT examples for common task types."""
# Math examples
self.examples["math"] = [
CoTExample(
problem="If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is its average speed for the entire journey?",
reasoning="""Step 1: Calculate total distance traveled.
Total distance = 120 miles + 180 miles = 300 miles
Step 2: Calculate total time taken.
Total time = 2 hours + 3 hours = 5 hours
Step 3: Calculate average speed.
Average speed = Total distance / Total time
Average speed = 300 miles / 5 hours = 60 miles per hour""",
answer="60 miles per hour"
),
CoTExample(
problem="A store offers a 20% discount on a $80 item, then applies an additional 10% off the discounted price. What is the final price?",
reasoning="""Step 1: Calculate the price after the first discount.
First discount = 20% of $80 = 0.20 × $80 = $16
Price after first discount = $80 - $16 = $64
Step 2: Calculate the price after the second discount.
Second discount = 10% of $64 = 0.10 × $64 = $6.40
Final price = $64 - $6.40 = $57.60""",
answer="$57.60"
)
]
# Logic examples
self.examples["logic"] = [
CoTExample(
problem="All roses are flowers. Some flowers fade quickly. Can we conclude that some roses fade quickly?",
reasoning="""Step 1: Identify the premises.
Premise 1: All roses are flowers (Every rose is a flower)
Premise 2: Some flowers fade quickly (At least one flower fades quickly)
Step 2: Analyze the logical relationship.
From Premise 1, we know roses are a subset of flowers.
From Premise 2, we know some flowers (not necessarily roses) fade quickly.
Step 3: Check if the conclusion follows.
The flowers that fade quickly might be roses, or they might be other flowers like tulips or daisies.
We cannot determine which specific flowers fade quickly.
Step 4: Conclusion.
No, we cannot conclude that some roses fade quickly. The flowers that fade quickly might not include any roses.""",
answer="No, we cannot conclude this"
)
]
# Code examples
self.examples["code"] = [
CoTExample(
problem="Write a function to find the second largest number in a list without sorting.",
reasoning="""Step 1: Understand the requirements.
- Find second largest number
- Cannot use sorting
- Need to handle edge cases (list too small, duplicates)
Step 2: Design the approach.
- Track two variables: largest and second_largest
- Iterate through the list once
- Update both variables as we find larger numbers
Step 3: Handle edge cases.
- If list has fewer than 2 elements, return None
- If all elements are the same, return None (no second largest)
Step 4: Implement the solution.
def find_second_largest(nums):
if len(nums) < 2:
return None
largest = second = float('-inf')
for num in nums:
if num > largest:
second = largest
largest = num
elif num > second and num != largest:
second = num
return second if second != float('-inf') else None""",
answer="See implementation above"
)
]
def add_example(self, task_type: str, example: CoTExample) -> None:
"""Add a custom example."""
if task_type not in self.examples:
self.examples[task_type] = []
self.examples[task_type].append(example)
def _format_examples(self, task_type: str, num_examples: int = 2) -> str:
"""Format examples for the prompt."""
examples = self.examples.get(task_type, self.examples.get("math", []))
selected = examples[:num_examples]
formatted = []
for i, ex in enumerate(selected, 1):
formatted.append(f"""Example {i}:
Problem: {ex.problem}
Reasoning:
{ex.reasoning}
Answer: {ex.answer}
""")
return "\n".join(formatted)
async def solve(
self,
problem: str,
task_type: str = "math",
num_examples: int = 2
) -> CoTResponse:
"""Solve using few-shot CoT."""
examples_text = self._format_examples(task_type, num_examples)
prompt = f"""Here are some examples of how to solve problems step by step:
{examples_text}
Now solve this problem using the same step-by-step approach:
Problem: {problem}
Reasoning:"""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0
)
full_response = response.choices[0].message.content
# Parse reasoning and answer
if "Answer:" in full_response:
parts = full_response.split("Answer:")
reasoning = parts[0].strip()
answer = parts[1].strip()
else:
reasoning = full_response
answer = full_response.split("\n")[-1].strip()
return CoTResponse(reasoning=reasoning, answer=answer)
Self-Consistency
from dataclasses import dataclass
from typing import Any, Optional
from collections import Counter
@dataclass
class ConsistencyResult:
"""Result of self-consistency sampling."""
answer: str
confidence: float
reasoning_paths: list[str]
vote_distribution: dict[str, int]
class SelfConsistencyCoT:
"""Self-consistency with chain-of-thought."""
def __init__(
self,
client: Any,
model: str = "gpt-4o-mini",
num_samples: int = 5
):
self.client = client
self.model = model
self.num_samples = num_samples
self.cot = ZeroShotCoT(client, model)
async def solve(
self,
problem: str,
task_type: str = "default",
temperature: float = 0.7
) -> ConsistencyResult:
"""Solve with self-consistency sampling."""
# Generate multiple reasoning paths
responses = []
for _ in range(self.num_samples):
response = await self._generate_path(problem, task_type, temperature)
responses.append(response)
# Extract answers and count votes
answers = [r.answer for r in responses]
vote_counts = Counter(answers)
# Get majority answer
majority_answer, majority_count = vote_counts.most_common(1)[0]
confidence = majority_count / self.num_samples
# Get reasoning paths that led to majority answer
majority_paths = [
r.reasoning for r in responses
if r.answer == majority_answer
]
return ConsistencyResult(
answer=majority_answer,
confidence=confidence,
reasoning_paths=majority_paths,
vote_distribution=dict(vote_counts)
)
async def _generate_path(
self,
problem: str,
task_type: str,
temperature: float
) -> CoTResponse:
"""Generate a single reasoning path."""
trigger = self.cot.triggers.get(task_type, self.cot.triggers["default"])
prompt = f"""{problem}
{trigger}
After your reasoning, provide your final answer on a new line starting with "Answer:"""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=temperature
)
full_response = response.choices[0].message.content
# Parse response
if "Answer:" in full_response:
parts = full_response.split("Answer:")
reasoning = parts[0].strip()
answer = parts[1].strip().split("\n")[0].strip()
else:
reasoning = full_response
answer = self._extract_final_answer(full_response)
return CoTResponse(reasoning=reasoning, answer=answer)
def _extract_final_answer(self, text: str) -> str:
"""Extract answer from reasoning text."""
# Look for common answer patterns
import re
patterns = [
r"(?:the answer is|therefore|thus|so)[:\s]+(.+?)(?:\.|$)",
r"(?:=\s*)(\d+(?:\.\d+)?)",
r"(?:final answer)[:\s]+(.+?)(?:\.|$)"
]
for pattern in patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
return match.group(1).strip()
# Return last line as fallback
lines = [l.strip() for l in text.split("\n") if l.strip()]
return lines[-1] if lines else ""
class WeightedSelfConsistency:
"""Self-consistency with confidence-weighted voting."""
def __init__(
self,
client: Any,
model: str = "gpt-4o-mini",
num_samples: int = 5
):
self.client = client
self.model = model
self.num_samples = num_samples
async def solve(self, problem: str) -> ConsistencyResult:
"""Solve with confidence-weighted voting."""
responses = []
for _ in range(self.num_samples):
response = await self._generate_with_confidence(problem)
responses.append(response)
# Weighted voting
answer_weights: dict[str, float] = {}
answer_paths: dict[str, list[str]] = {}
for resp in responses:
answer = resp["answer"]
confidence = resp["confidence"]
reasoning = resp["reasoning"]
if answer not in answer_weights:
answer_weights = 0
answer_paths = []
answer_weights += confidence
answer_paths.append(reasoning)
# Get highest weighted answer
best_answer = max(answer_weights, key=answer_weights.get)
total_weight = sum(answer_weights.values())
return ConsistencyResult(
answer=best_answer,
confidence=answer_weights[best_answer] / total_weight,
reasoning_paths=answer_paths[best_answer],
vote_distribution={k: int(v * 100) for k, v in answer_weights.items()}
)
async def _generate_with_confidence(self, problem: str) -> dict:
"""Generate response with confidence score."""
prompt = f"""{problem}
Think step by step to solve this problem.
After your reasoning, provide:
1. Your final answer
2. Your confidence level (0-100%)
Format:
Reasoning: [your step-by-step reasoning]
Answer: [your answer]
Confidence: [0-100]%"""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
text = response.choices[0].message.content
# Parse response
import re
reasoning = ""
answer = ""
confidence = 0.5
if "Reasoning:" in text:
reasoning = text.split("Reasoning:")[1].split("Answer:")[0].strip()
if "Answer:" in text:
answer_part = text.split("Answer:")[1]
if "Confidence:" in answer_part:
answer = answer_part.split("Confidence:")[0].strip()
else:
answer = answer_part.strip()
conf_match = re.search(r"Confidence:\s*(\d+)", text)
if conf_match:
confidence = int(conf_match.group(1)) / 100
return {
"reasoning": reasoning,
"answer": answer,
"confidence": confidence
}
Adaptive Chain-of-Thought
from dataclasses import dataclass
from typing import Any, Optional
from enum import Enum
class ProblemComplexity(Enum):
"""Complexity levels for problems."""
SIMPLE = "simple"
MODERATE = "moderate"
COMPLEX = "complex"
class AdaptiveCoT:
"""Adaptively apply CoT based on problem complexity."""
def __init__(self, client: Any, model: str = "gpt-4o-mini"):
self.client = client
self.model = model
self.zero_shot = ZeroShotCoT(client, model)
self.few_shot = FewShotCoT(client, model)
self.self_consistency = SelfConsistencyCoT(client, model)
async def solve(self, problem: str, task_type: str = "default") -> CoTResponse:
"""Solve using appropriate CoT strategy."""
# Assess complexity
complexity = await self._assess_complexity(problem)
if complexity == ProblemComplexity.SIMPLE:
# Direct answer without CoT
return await self._direct_answer(problem)
elif complexity == ProblemComplexity.MODERATE:
# Zero-shot CoT
return await self.zero_shot.solve(problem, task_type)
else: # COMPLEX
# Self-consistency with multiple samples
result = await self.self_consistency.solve(problem, task_type)
return CoTResponse(
reasoning=result.reasoning_paths[0] if result.reasoning_paths else "",
answer=result.answer,
confidence=result.confidence
)
async def _assess_complexity(self, problem: str) -> ProblemComplexity:
"""Assess problem complexity."""
prompt = f"""Assess the complexity of this problem on a scale of 1-3:
1 = Simple (can be answered directly, single step)
2 = Moderate (requires some reasoning, 2-3 steps)
3 = Complex (requires multi-step reasoning, calculations, or analysis)
Problem: {problem}
Respond with only the number (1, 2, or 3):"""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0,
max_tokens=10
)
try:
level = int(response.choices[0].message.content.strip()[0])
if level == 1:
return ProblemComplexity.SIMPLE
elif level == 2:
return ProblemComplexity.MODERATE
else:
return ProblemComplexity.COMPLEX
except:
return ProblemComplexity.MODERATE
async def _direct_answer(self, problem: str) -> CoTResponse:
"""Get direct answer without CoT."""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": problem}],
temperature=0
)
answer = response.choices[0].message.content
return CoTResponse(
reasoning="Direct answer (simple problem)",
answer=answer,
confidence=0.8
)
class TreeOfThought:
"""Tree-of-thought reasoning for complex problems."""
def __init__(
self,
client: Any,
model: str = "gpt-4o-mini",
branching_factor: int = 3,
max_depth: int = 3
):
self.client = client
self.model = model
self.branching_factor = branching_factor
self.max_depth = max_depth
async def solve(self, problem: str) -> CoTResponse:
"""Solve using tree-of-thought."""
# Generate initial thoughts
thoughts = await self._generate_thoughts(problem, "", 0)
# Evaluate and expand best thoughts
best_path = await self._search(problem, thoughts)
# Extract final answer
answer = await self._extract_answer(problem, best_path)
return CoTResponse(
reasoning="\n".join(best_path),
answer=answer,
confidence=0.85
)
async def _generate_thoughts(
self,
problem: str,
context: str,
depth: int
) -> list[str]:
"""Generate possible next thoughts."""
prompt = f"""Problem: {problem}
Current reasoning so far:
{context if context else "(Starting fresh)"}
Generate {self.branching_factor} different possible next steps in the reasoning.
Each step should be a distinct approach or consideration.
Format as a numbered list."""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
text = response.choices[0].message.content
# Parse numbered list
import re
thoughts = re.findall(r'\d+\.\s*(.+?)(?=\d+\.|$)', text, re.DOTALL)
return [t.strip() for t in thoughts[:self.branching_factor]]
async def _evaluate_thought(
self,
problem: str,
thought_path: list[str]
) -> float:
"""Evaluate how promising a thought path is."""
prompt = f"""Problem: {problem}
Reasoning path:
{chr(10).join(thought_path)}
Rate how promising this reasoning path is for solving the problem.
Consider: Is it on the right track? Does it make progress?
Rate from 0 to 10, where 10 is very promising.
Respond with only a number:"""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0,
max_tokens=10
)
try:
return float(response.choices[0].message.content.strip()) / 10
except:
return 0.5
async def _search(
self,
problem: str,
initial_thoughts: list[str]
) -> list[str]:
"""Search for best reasoning path."""
# Simple beam search
beam = [[t] for t in initial_thoughts]
for depth in range(1, self.max_depth):
candidates = []
for path in beam:
# Generate next thoughts
context = "\n".join(path)
next_thoughts = await self._generate_thoughts(problem, context, depth)
for thought in next_thoughts:
new_path = path + [thought]
score = await self._evaluate_thought(problem, new_path)
candidates.append((new_path, score))
# Keep top paths
candidates.sort(key=lambda x: x[1], reverse=True)
beam = [c[0] for c in candidates[:self.branching_factor]]
return beam[0] if beam else []
async def _extract_answer(self, problem: str, path: list[str]) -> str:
"""Extract final answer from reasoning path."""
prompt = f"""Problem: {problem}
Complete reasoning:
{chr(10).join(path)}
Based on this reasoning, what is the final answer?
Provide only the answer:"""
response = await self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0,
max_tokens=100
)
return response.choices[0].message.content.strip()
Production CoT Service
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
app = FastAPI()
# Initialize components
zero_shot_cot = None # Initialize with client
few_shot_cot = None
self_consistency = None
adaptive_cot = None
class SolveRequest(BaseModel):
problem: str
task_type: str = "default"
method: str = "adaptive"
num_samples: int = 5
class AddExampleRequest(BaseModel):
task_type: str
problem: str
reasoning: str
answer: str
@app.post("/v1/solve")
async def solve_problem(request: SolveRequest):
"""Solve problem using chain-of-thought."""
if request.method == "zero_shot":
result = await zero_shot_cot.solve(request.problem, request.task_type)
elif request.method == "few_shot":
result = await few_shot_cot.solve(request.problem, request.task_type)
elif request.method == "self_consistency":
sc_result = await self_consistency.solve(
request.problem,
request.task_type
)
result = CoTResponse(
reasoning=sc_result.reasoning_paths[0] if sc_result.reasoning_paths else "",
answer=sc_result.answer,
confidence=sc_result.confidence
)
elif request.method == "adaptive":
result = await adaptive_cot.solve(request.problem, request.task_type)
else:
raise HTTPException(status_code=400, detail=f"Unknown method: {request.method}")
return {
"answer": result.answer,
"reasoning": result.reasoning,
"confidence": result.confidence,
"method": request.method
}
@app.post("/v1/solve/consistency")
async def solve_with_consistency(request: SolveRequest):
"""Solve with self-consistency sampling."""
result = await self_consistency.solve(
request.problem,
request.task_type
)
return {
"answer": result.answer,
"confidence": result.confidence,
"vote_distribution": result.vote_distribution,
"num_reasoning_paths": len(result.reasoning_paths),
"sample_reasoning": result.reasoning_paths[0] if result.reasoning_paths else None
}
@app.post("/v1/examples")
async def add_example(request: AddExampleRequest):
"""Add a custom CoT example."""
example = CoTExample(
problem=request.problem,
reasoning=request.reasoning,
answer=request.answer
)
few_shot_cot.add_example(request.task_type, example)
return {
"status": "added",
"task_type": request.task_type,
"total_examples": len(few_shot_cot.examples.get(request.task_type, []))
}
@app.get("/v1/examples/{task_type}")
async def list_examples(task_type: str):
"""List examples for a task type."""
examples = few_shot_cot.examples.get(task_type, [])
return {
"task_type": task_type,
"examples": [
{
"problem": e.problem,
"reasoning": e.reasoning,
"answer": e.answer
}
for e in examples
]
}
@app.get("/v1/task-types")
async def list_task_types():
"""List available task types."""
return {
"task_types": list(zero_shot_cot.triggers.keys()),
"example_types": list(few_shot_cot.examples.keys())
}
@app.get("/health")
async def health():
return {"status": "healthy"}
References
- Chain-of-Thought Paper: https://arxiv.org/abs/2201.11903
- Self-Consistency Paper: https://arxiv.org/abs/2203.11171
- Tree of Thoughts: https://arxiv.org/abs/2305.10601
- Zero-Shot CoT: https://arxiv.org/abs/2205.11916
Conclusion
Chain-of-thought prompting is one of the most effective techniques for improving LLM reasoning. Zero-shot CoT with “let’s think step by step” provides immediate gains with no setup—just append the trigger phrase to your prompt. Few-shot CoT with worked examples teaches the model your preferred reasoning style and format, especially valuable for domain-specific problems. Self-consistency sampling generates multiple reasoning paths and takes the majority vote, dramatically improving reliability on complex problems. Adaptive approaches assess problem complexity first, applying CoT only when beneficial—simple questions don’t need elaborate reasoning chains. For production systems, start with zero-shot CoT for quick wins, add few-shot examples for your specific domain, and use self-consistency for high-stakes decisions where accuracy matters more than latency. The key insight is that making the model show its work doesn’t just help you understand its reasoning—it actually improves the quality of that reasoning.
Discover more from Code, Cloud & Context
Subscribe to get the latest posts sent to your email.

Leave a Reply