Identifying Memory Leaks in Long-Running Celery Workers with tracemalloc
You’ll notice memory usage gradually climbing in your Celery workers after days or weeks of processing thousands of tasks—eventually triggering OOM kills as RAM exhausts. This memory leak behavior, common in long-running processes, typically arises from global variables, unclosed database connections, or objects that evade garbage collection.
In this guide, we’ll use Python’s built-in tracemalloc module to identify the source of memory leaks in Celery workers. Unlike external profilers, tracemalloc is part of the standard library (Python 3.9+), requires no dependencies, and provides precise allocation tracking.
What you’ll learn:
- How to detect memory leaks using tracemalloc snapshots
- Integrating tracemalloc into Celery workers with signal handlers
- Common leak patterns in Celery tasks and how to fix them
- Production-ready monitoring strategies
Related: 48. FastAPI BackgroundTasks vs Celery
The Challenge of Memory Leaks in Long-Running Processes
Memory leaks are a classic problem in software engineering, particularly for long-running processes like Celery workers. Unlike short-lived scripts, workers process thousands or millions of tasks over days or weeks, causing small memory allocations to accumulate until they become critical.
Historically, detecting these leaks required specialized tools like valgrind for C/C++ or external Python profilers like objgraph. However, these tools often introduce significant overhead or require complex setup. Python’s tracemalloc module, introduced in Python 3.4 and stabilized in 3.9, provides a lightweight, standard-library alternative specifically designed for tracking memory allocations in production environments.
The core principle: take snapshots of memory usage at different points, compare them to identify growth, and trace allocations back to specific lines of code. This approach works particularly well for Celery workers, where memory growth often correlates with the number of tasks processed.
Celery Memory Leak Symptoms
Long-running workers (--maxtasksperchild=None) handle 100k+ tasks but:
- RSS/PSS grows unbounded → OOM kills
htop: 10GB+ after weeks- Flower: tasks slow, timeouts
Leak example (tasks.py):
leaky_list = [] # Global!
@app.task
def process_data():
leaky_list.append("data"*1000) # Never clears!
return len(leaky_list)
To run the worker without child process restarts:
$ celery -A app worker --loglevel=info --maxtasksperchild=None
After 10k tasks: RSS 1.9GB.
tracemalloc: Stdlib Memory Tracer
tracemalloc tracks every allocation by filename:lineno.
Alternative Memory Profiling Tools
Before diving into tracemalloc, it’s worth noting other options available for Python memory profiling:
objgraph: Excellent for visualizing object references and finding reference cycles, but requires manual inspection and doesn’t track allocations over time as easily.memory_profiler: Provides line-by-line memory usage but can be slow and intrusive, making it less suitable for production.filprofiler: Focuses on memory allocation in data processing pipelines (like Pandas/NumPy), great for data science workloads but overkill for general Celery tasks.tracemalloc: Our choice for this guide because it’s built-in, low-overhead, and provides historical snapshot comparisons—ideal for detecting leaks in long-running workers.
For Celery workers specifically, tracemalloc strikes the right balance between detail and performance.
Tip:
objgraphvisualizes references well but requires manual use and misses time-series growth.memory_profileroffers line-by-line detail but slows execution significantly.filprofilertargets data pipelines.tracemalloc—stdlib, ~1% overhead—delivers snapshot diffs perfect for leak hunting over 10k+ tasks.
Basic usage:
import tracemalloc
tracemalloc.start(10) # nframes
snapshot1 = tracemalloc.take_snapshot()
# Run leaky tasks
snapshot2 = tracemalloc.take_snapshot()
stats = snapshot2.compare_to(snapshot1, 'lineno')
for stat in stats[:5]:
print(stat)
Output:
tasks.py:8: size=1.9 GiB (+1.9 GiB), count=10000 (+10000), average=199.9 KiB
Pinpoints leaky_list.append().
Integrating tracemalloc in Celery Workers
Method 1: Worker Signal Handler (Production-Ready)
Handle SIGUSR1 to dump on-demand.
app/worker_signals.py:
import os
import signal
import tracemalloc
from celery.signals import worker_process_init
@worker_process_init.connect
def init_tracemalloc(**kwargs):
tracemalloc.start(10)
def dump_trace(sig, frame):
snapshot = tracemalloc.take_snapshot()
path = f"/tmp/celery-leak-{os.getpid()}.bin"
snapshot.dump(path)
print(f"Snapshot dumped: {path}")
signal.signal(signal.SIGUSR1, dump_trace)
celery.py (app module):
import worker_signals # Registers signals
Run it with:
$ celery -A app worker -Q default --maxtasksperchild=None
If you suspect a leak:
$ kill -USR1 <PID>
Method 2: Periodic Snapshots (Dev)
Task counter + every 1000 tasks:
task_count = 0
@app.task(bind=True)
def leaky_task(self):
global task_count
task_count += 1
if task_count % 1000 == 0:
snapshot = tracemalloc.take_snapshot()
top = snapshot.statistics('lineno')[:3]
self.update_state(state='PROGRESS', meta={'mem_top': [str(s) for s in top]})
leaky_list.append("leak")
Flower shows progress.
Analyzing Snapshots: Diffs & Filtration
Load/compare:
snapshot1 = tracemalloc.Snapshot.load("/tmp/before.bin")
snapshot2 = tracemalloc.take_snapshot()
diff = snapshot2.compare_to(snapshot1, 'traceback')
for stat in diff[:10]:
print(f"{stat.traceback.format()[-1]}: {stat.size_diff / 1024**2:.1f} MiB")
Filter tracebacks:
filtered = snapshot.filter_traces((
tracemalloc.Filter(False, "<unknown>"),
tracemalloc.Filter(True, "yourapp."),
))
top = filtered.take_snapshot().statistics('lineno')[:5]
Common Celery Leaks & Fixes
| Leak Source | Symptom | tracemalloc Trace | Fix |
|---|---|---|---|
| Global lists | RSS linear | list.append() | collections.deque(maxlen=1000) |
| DB conns | Conn pool exhaust | psycopg2.connect() no close() | contextlib.closing() or pools |
| Pandas DFs | 1GB+ objs | pd.DataFrame() | del df; gc.collect() |
| Thread locals | Per-task growth | threading.local() | Clear post-task |
Fixed task:
from collections import deque
leaky_list = deque(maxlen=1000)
@app.task
def safe_task():
leaky_list.append("data")
Benchmarks: RSS Over 10k Tasks
psutil monitor (monitor.py)
Run in a separate terminal:
$ python monitor.py
import psutil
import time
p = psutil.Process()
while True:
print(f"RSS: {p.memory_info().rss / 1024**2:.0f} MiB")
time.sleep(10)
| Version | 1k Tasks | 5k | 10k | Stable? |
|---|---|---|---|---|
| Leaky | 210 MB | 1.1 GB | 2.0 GB | No |
| Fixed | 210 MB | 211 MB | 210 MB | Yes |
--maxtasksperchild=1000 as a temporary workaround.
Tip: Restarting every 1000 tasks bounds leaks but incurs process startup overhead—use while fixing root causes.
Production: Flower + Alerts
- Flower:
--events+ custom mem plugin - Prometheus:
celery_worker_memory_bytes - Restart:
--maxtasksperchild=5000+ leak fixed
Conclusion
Memory leaks in long-running Celery workers can lead to production outages and performance degradation. Python’s tracemalloc module provides a powerful, dependency-free solution for identifying these leaks with precision.
Key Takeaways:
- Use tracemalloc snapshots to compare memory usage before and after task execution
- Implement signal handlers (like SIGUSR1) for production-safe memory dumps
- Fix common patterns: replace global lists with
collections.deque, use context managers for DB connections, and clear thread-local storage - Monitor continuously using tools like
psutiland Flower to track RSS growth over time
Next Steps:
- Implement the signal handler pattern in your Celery workers
- Set up periodic memory profiling during development
- Consider
objgraphfor detecting reference cycles if tracemalloc shows persistent leaks
The tools and techniques in this guide require no external dependencies and work with Python 3.9+. Start monitoring your workers today to catch memory issues before they become critical failures.
Related Resources:
Sponsored by Durable Programming
Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.
Hire Durable Programming