Tips and Tricks #191: Cache LLM Responses for Cost Reduction

Implement semantic caching to avoid redundant LLM calls and reduce API costs.

Code Snippet

import hashlib
import json
from functools import lru_cache
from typing import Optional

class LLMCache:
    def __init__(self):
        self._cache = {}
    
    def _hash_prompt(self, prompt: str, model: str) -> str:
        """Create deterministic hash for cache key."""
        content = f"{model}:{prompt}"
        return hashlib.sha256(content.encode()).hexdigest()
    
    def get(self, prompt: str, model: str) -> Optional[str]:
        key = self._hash_prompt(prompt, model)
        return self._cache.get(key)
    
    def set(self, prompt: str, model: str, response: str):
        key = self._hash_prompt(prompt, model)
        self._cache[key] = response

cache = LLMCache()

def cached_llm_call(prompt: str, model: str = "gpt-4") -> str:
    # Check cache first
    cached = cache.get(prompt, model)
    if cached:
        return cached
    
    # Make actual API call
    response = call_openai_api(prompt, model)
    
    # Cache for future use
    cache.set(prompt, model, response)
    return response

Why This Helps

Reduces API costs by 30-70% for repeated queries
Faster response times for cached prompts
Enables offline development and testing

How to Test

Call same prompt twice, verify cache hit
Monitor API call counts

When to Use

Any application with repeated or similar prompts. Chatbots, content generation, analysis.

Performance/Security Notes

Use Redis for production caching. Consider TTL for time-sensitive content.

References

https://python.langchain.com/docs/modules/model_io/llms/llm_caching

Try this tip in your next project and share your results in the comments!

Discover more from Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Searching in

Tips and Tricks #191: Cache LLM Responses for Cost Reduction

Code Snippet

Why This Helps

How to Test

When to Use

Performance/Security Notes

References

Discover more from Code, Cloud & Context

Leave a Reply

Searching in

Code Snippet

Why This Helps

How to Test

When to Use

Performance/Security Notes

References

Discover more from Code, Cloud & Context

Related Posts

Tips and Tricks #206: Use Multi-Stage Docker Builds for Smaller Images

Tips and Tricks #4: Freeze Collections for Thread-Safe Read Access

Tips and Tricks #144: Automate Security Scanning in CI Pipeline

Leave a Reply