Process large datasets without loading everything into memory using Python generators.
Code Snippet
# Before: Loads entire file into memory
def process_file_eager(filename):
with open(filename) as f:
lines = f.readlines() # All in memory!
return [parse_line(line) for line in lines]
# After: Streams data lazily
def process_file_lazy(filename):
with open(filename) as f:
for line in f:
yield parse_line(line)
# Usage: Memory stays constant regardless of file size
for record in process_file_lazy("huge_file.csv"):
process_record(record)
Why This Helps
- Constant memory usage regardless of data size
- Enables processing of files larger than RAM
- Integrates seamlessly with for loops and itertools
How to Test
- Monitor memory with memory_profiler
- Compare peak memory on large files
When to Use
ETL pipelines, log processing, any scenario with large sequential data.
Performance/Security Notes
Generators can only be iterated once. Use itertools.tee() if multiple passes needed.
References
Try this tip in your next project and share your results in the comments!
Discover more from Code, Cloud & Context
Subscribe to get the latest posts sent to your email.