Tips and Tricks #95: Cache LLM Responses for Cost Reduction

Published on April 7, 2025

Implement semantic caching to avoid redundant LLM calls and reduce API costs.

Code Snippet

import hashlib
import json
from functools import lru_cache
from typing import Optional

class LLMCache:
    def __init__(self):
        self._cache = {}
    
    def _hash_prompt(self, prompt: str, model: str) -> str:
        """Create deterministic hash for cache key."""
        content = f"{model}:{prompt}"
        return hashlib.sha256(content.encode()).hexdigest()
    
    def get(self, prompt: str, model: str) -> Optional[str]:
        key = self._hash_prompt(prompt, model)
        return self._cache.get(key)
    
    def set(self, prompt: str, model: str, response: str):
        key = self._hash_prompt(prompt, model)
        self._cache[key] = response

cache = LLMCache()

def cached_llm_call(prompt: str, model: str = "gpt-4") -> str:
    # Check cache first
    cached = cache.get(prompt, model)
    if cached:
        return cached
    
    # Make actual API call
    response = call_openai_api(prompt, model)
    
    # Cache for future use
    cache.set(prompt, model, response)
    return response

Why This Helps

Reduces API costs by 30-70% for repeated queries
Faster response times for cached prompts
Enables offline development and testing

How to Test

Call same prompt twice, verify cache hit
Monitor API call counts

When to Use

Any application with repeated or similar prompts. Chatbots, content generation, analysis.

Performance/Security Notes

Use Redis for production caching. Consider TTL for time-sensitive content.

References

https://python.langchain.com/docs/modules/model_io/llms/llm_caching

Try this tip in your next project and share your results in the comments!

Discover more from Byte Architect

Subscribe to get the latest posts sent to your email.

Previous

Tips and Tricks #96: Implement Retry Logic for LLM API Calls

Next

Azure Front Door: A Solutions Architect’s Guide to Global Load Balancing and CDN

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Searching in

Enter search term to find items

to navigate, to select, and to close