Identifying Memory Leaks in Long-Running Celery Workers with tracemalloc

Mar 15, 2026

You’ll notice memory usage gradually climbing in your Celery workers after days or weeks of processing thousands of tasks—eventually triggering OOM kills as RAM exhausts. This memory leak behavior, common in long-running processes, typically arises from global variables, unclosed database connections, or objects that evade garbage collection.

In this guide, we’ll use Python’s built-in tracemalloc module to identify the source of memory leaks in Celery workers. Unlike external profilers, tracemalloc is part of the standard library (Python 3.9+), requires no dependencies, and provides precise allocation tracking.

What you’ll learn:

How to detect memory leaks using tracemalloc snapshots
Integrating tracemalloc into Celery workers with signal handlers
Common leak patterns in Celery tasks and how to fix them
Production-ready monitoring strategies

The Challenge of Memory Leaks in Long-Running Processes

Memory leaks are a classic problem in software engineering, particularly for long-running processes like Celery workers. Unlike short-lived scripts, workers process thousands or millions of tasks over days or weeks, causing small memory allocations to accumulate until they become critical.

Historically, detecting these leaks required specialized tools like valgrind for C/C++ or external Python profilers like objgraph. However, these tools often introduce significant overhead or require complex setup. Python’s tracemalloc module, introduced in Python 3.4 and stabilized in 3.9, provides a lightweight, standard-library alternative specifically designed for tracking memory allocations in production environments.

The core principle: take snapshots of memory usage at different points, compare them to identify growth, and trace allocations back to specific lines of code. This approach works particularly well for Celery workers, where memory growth often correlates with the number of tasks processed.

Celery Memory Leak Symptoms

Long-running workers (--maxtasksperchild=None) handle 100k+ tasks but:

RSS/PSS grows unbounded → OOM kills
htop: 10GB+ after weeks
Flower: tasks slow, timeouts

Leak example (tasks.py):

leaky_list = []  # Global!

@app.task
def process_data():
    leaky_list.append("data"*1000)  # Never clears!
    return len(leaky_list)

To run the worker without child process restarts:

$ celery -A app worker --loglevel=info --maxtasksperchild=None

After 10k tasks: RSS 1.9GB.

tracemalloc: Stdlib Memory Tracer

tracemalloc tracks every allocation by filename:lineno.

Alternative Memory Profiling Tools

Before diving into tracemalloc, it’s worth noting other options available for Python memory profiling:

objgraph: Excellent for visualizing object references and finding reference cycles, but requires manual inspection and doesn’t track allocations over time as easily.
memory_profiler: Provides line-by-line memory usage but can be slow and intrusive, making it less suitable for production.
filprofiler: Focuses on memory allocation in data processing pipelines (like Pandas/NumPy), great for data science workloads but overkill for general Celery tasks.
tracemalloc: Our choice for this guide because it’s built-in, low-overhead, and provides historical snapshot comparisons—ideal for detecting leaks in long-running workers.

For Celery workers specifically, tracemalloc strikes the right balance between detail and performance.

Tip: objgraph visualizes references well but requires manual use and misses time-series growth. memory_profiler offers line-by-line detail but slows execution significantly. filprofiler targets data pipelines. tracemalloc—stdlib, ~1% overhead—delivers snapshot diffs perfect for leak hunting over 10k+ tasks.

Basic usage:

import tracemalloc
tracemalloc.start(10)  # nframes

snapshot1 = tracemalloc.take_snapshot()
# Run leaky tasks
snapshot2 = tracemalloc.take_snapshot()

stats = snapshot2.compare_to(snapshot1, 'lineno')
for stat in stats[:5]:
    print(stat)

Output:

tasks.py:8: size=1.9 GiB (+1.9 GiB), count=10000 (+10000), average=199.9 KiB

Pinpoints leaky_list.append().

Integrating tracemalloc in Celery Workers

Method 1: Worker Signal Handler (Production-Ready)

Handle SIGUSR1 to dump on-demand.

app/worker_signals.py:

import os
import signal
import tracemalloc
from celery.signals import worker_process_init

@worker_process_init.connect
def init_tracemalloc(**kwargs):
    tracemalloc.start(10)

def dump_trace(sig, frame):
    snapshot = tracemalloc.take_snapshot()
    path = f"/tmp/celery-leak-{os.getpid()}.bin"
    snapshot.dump(path)
    print(f"Snapshot dumped: {path}")

signal.signal(signal.SIGUSR1, dump_trace)

celery.py (app module):

import worker_signals  # Registers signals

Run it with:

$ celery -A app worker -Q default --maxtasksperchild=None

If you suspect a leak:

$ kill -USR1 <PID>

Method 2: Periodic Snapshots (Dev)

Task counter + every 1000 tasks:

task_count = 0

@app.task(bind=True)
def leaky_task(self):
    global task_count
    task_count += 1
    if task_count % 1000 == 0:
        snapshot = tracemalloc.take_snapshot()
        top = snapshot.statistics('lineno')[:3]
        self.update_state(state='PROGRESS', meta={'mem_top': [str(s) for s in top]})
    leaky_list.append("leak")

Flower shows progress.

Analyzing Snapshots: Diffs & Filtration

Load/compare:

snapshot1 = tracemalloc.Snapshot.load("/tmp/before.bin")
snapshot2 = tracemalloc.take_snapshot()
diff = snapshot2.compare_to(snapshot1, 'traceback')

for stat in diff[:10]:
    print(f"{stat.traceback.format()[-1]}: {stat.size_diff / 1024**2:.1f} MiB")

Filter tracebacks:

filtered = snapshot.filter_traces((
    tracemalloc.Filter(False, "<unknown>"),
    tracemalloc.Filter(True, "yourapp."),
))
top = filtered.take_snapshot().statistics('lineno')[:5]

Common Celery Leaks & Fixes

Leak Source	Symptom	tracemalloc Trace	Fix
Global lists	RSS linear	`list.append()`	`collections.deque(maxlen=1000)`
DB conns	Conn pool exhaust	`psycopg2.connect()` no `close()`	`contextlib.closing()` or pools
Pandas DFs	1GB+ objs	`pd.DataFrame()`	`del df; gc.collect()`
Thread locals	Per-task growth	`threading.local()`	Clear post-task

Fixed task:

from collections import deque
leaky_list = deque(maxlen=1000)

@app.task
def safe_task():
    leaky_list.append("data")

Benchmarks: RSS Over 10k Tasks

psutil monitor (monitor.py)

Run in a separate terminal:

$ python monitor.py

import psutil
import time
p = psutil.Process()
while True:
    print(f"RSS: {p.memory_info().rss / 1024**2:.0f} MiB")
    time.sleep(10)

Version	1k Tasks	5k	10k	Stable?
Leaky	210 MB	1.1 GB	2.0 GB	No
Fixed	210 MB	211 MB	210 MB	Yes

--maxtasksperchild=1000 as a temporary workaround.

Tip: Restarting every 1000 tasks bounds leaks but incurs process startup overhead—use while fixing root causes.

Production: Flower + Alerts

Flower: --events + custom mem plugin
Prometheus: celery_worker_memory_bytes
Restart: --maxtasksperchild=5000 + leak fixed

Conclusion

Memory leaks in long-running Celery workers can lead to production outages and performance degradation. Python’s tracemalloc module provides a powerful, dependency-free solution for identifying these leaks with precision.

Key Takeaways:

Use tracemalloc snapshots to compare memory usage before and after task execution
Implement signal handlers (like SIGUSR1) for production-safe memory dumps
Fix common patterns: replace global lists with collections.deque, use context managers for DB connections, and clear thread-local storage
Monitor continuously using tools like psutil and Flower to track RSS growth over time

Next Steps:

Implement the signal handler pattern in your Celery workers
Set up periodic memory profiling during development
Consider objgraph for detecting reference cycles if tracemalloc shows persistent leaks

The tools and techniques in this guide require no external dependencies and work with Python 3.9+. Start monitoring your workers today to catch memory issues before they become critical failures.

Related Resources: