The go-to resource for upgrading Python, Django, Flask, and your dependencies.

Identifying Memory Leaks in Long-Running Celery Workers with tracemalloc


You’ll notice memory usage gradually climbing in your Celery workers after days or weeks of processing thousands of tasks—eventually triggering OOM kills as RAM exhausts. This memory leak behavior, common in long-running processes, typically arises from global variables, unclosed database connections, or objects that evade garbage collection.

In this guide, we’ll use Python’s built-in tracemalloc module to identify the source of memory leaks in Celery workers. Unlike external profilers, tracemalloc is part of the standard library (Python 3.9+), requires no dependencies, and provides precise allocation tracking.

What you’ll learn:

  • How to detect memory leaks using tracemalloc snapshots
  • Integrating tracemalloc into Celery workers with signal handlers
  • Common leak patterns in Celery tasks and how to fix them
  • Production-ready monitoring strategies

Related: 48. FastAPI BackgroundTasks vs Celery

The Challenge of Memory Leaks in Long-Running Processes

Memory leaks are a classic problem in software engineering, particularly for long-running processes like Celery workers. Unlike short-lived scripts, workers process thousands or millions of tasks over days or weeks, causing small memory allocations to accumulate until they become critical.

Historically, detecting these leaks required specialized tools like valgrind for C/C++ or external Python profilers like objgraph. However, these tools often introduce significant overhead or require complex setup. Python’s tracemalloc module, introduced in Python 3.4 and stabilized in 3.9, provides a lightweight, standard-library alternative specifically designed for tracking memory allocations in production environments.

The core principle: take snapshots of memory usage at different points, compare them to identify growth, and trace allocations back to specific lines of code. This approach works particularly well for Celery workers, where memory growth often correlates with the number of tasks processed.

Celery Memory Leak Symptoms

Long-running workers (--maxtasksperchild=None) handle 100k+ tasks but:

  • RSS/PSS grows unbounded → OOM kills
  • htop: 10GB+ after weeks
  • Flower: tasks slow, timeouts

Leak example (tasks.py):

leaky_list = []  # Global!

@app.task
def process_data():
    leaky_list.append("data"*1000)  # Never clears!
    return len(leaky_list)

To run the worker without child process restarts:

$ celery -A app worker --loglevel=info --maxtasksperchild=None

After 10k tasks: RSS 1.9GB.

tracemalloc: Stdlib Memory Tracer

tracemalloc tracks every allocation by filename:lineno.

Alternative Memory Profiling Tools

Before diving into tracemalloc, it’s worth noting other options available for Python memory profiling:

  1. objgraph: Excellent for visualizing object references and finding reference cycles, but requires manual inspection and doesn’t track allocations over time as easily.
  2. memory_profiler: Provides line-by-line memory usage but can be slow and intrusive, making it less suitable for production.
  3. filprofiler: Focuses on memory allocation in data processing pipelines (like Pandas/NumPy), great for data science workloads but overkill for general Celery tasks.
  4. tracemalloc: Our choice for this guide because it’s built-in, low-overhead, and provides historical snapshot comparisons—ideal for detecting leaks in long-running workers.

For Celery workers specifically, tracemalloc strikes the right balance between detail and performance.

Tip: objgraph visualizes references well but requires manual use and misses time-series growth. memory_profiler offers line-by-line detail but slows execution significantly. filprofiler targets data pipelines. tracemalloc—stdlib, ~1% overhead—delivers snapshot diffs perfect for leak hunting over 10k+ tasks.

Basic usage:

import tracemalloc
tracemalloc.start(10)  # nframes

snapshot1 = tracemalloc.take_snapshot()
# Run leaky tasks
snapshot2 = tracemalloc.take_snapshot()

stats = snapshot2.compare_to(snapshot1, 'lineno')
for stat in stats[:5]:
    print(stat)

Output:

tasks.py:8: size=1.9 GiB (+1.9 GiB), count=10000 (+10000), average=199.9 KiB

Pinpoints leaky_list.append().

Integrating tracemalloc in Celery Workers

Method 1: Worker Signal Handler (Production-Ready)

Handle SIGUSR1 to dump on-demand.

app/worker_signals.py:

import os
import signal
import tracemalloc
from celery.signals import worker_process_init

@worker_process_init.connect
def init_tracemalloc(**kwargs):
    tracemalloc.start(10)

def dump_trace(sig, frame):
    snapshot = tracemalloc.take_snapshot()
    path = f"/tmp/celery-leak-{os.getpid()}.bin"
    snapshot.dump(path)
    print(f"Snapshot dumped: {path}")

signal.signal(signal.SIGUSR1, dump_trace)

celery.py (app module):

import worker_signals  # Registers signals

Run it with:

$ celery -A app worker -Q default --maxtasksperchild=None

If you suspect a leak:

$ kill -USR1 <PID>

Method 2: Periodic Snapshots (Dev)

Task counter + every 1000 tasks:

task_count = 0

@app.task(bind=True)
def leaky_task(self):
    global task_count
    task_count += 1
    if task_count % 1000 == 0:
        snapshot = tracemalloc.take_snapshot()
        top = snapshot.statistics('lineno')[:3]
        self.update_state(state='PROGRESS', meta={'mem_top': [str(s) for s in top]})
    leaky_list.append("leak")

Flower shows progress.

Analyzing Snapshots: Diffs & Filtration

Load/compare:

snapshot1 = tracemalloc.Snapshot.load("/tmp/before.bin")
snapshot2 = tracemalloc.take_snapshot()
diff = snapshot2.compare_to(snapshot1, 'traceback')

for stat in diff[:10]:
    print(f"{stat.traceback.format()[-1]}: {stat.size_diff / 1024**2:.1f} MiB")

Filter tracebacks:

filtered = snapshot.filter_traces((
    tracemalloc.Filter(False, "<unknown>"),
    tracemalloc.Filter(True, "yourapp."),
))
top = filtered.take_snapshot().statistics('lineno')[:5]

Common Celery Leaks & Fixes

Leak SourceSymptomtracemalloc TraceFix
Global listsRSS linearlist.append()collections.deque(maxlen=1000)
DB connsConn pool exhaustpsycopg2.connect() no close()contextlib.closing() or pools
Pandas DFs1GB+ objspd.DataFrame()del df; gc.collect()
Thread localsPer-task growththreading.local()Clear post-task

Fixed task:

from collections import deque
leaky_list = deque(maxlen=1000)

@app.task
def safe_task():
    leaky_list.append("data")

Benchmarks: RSS Over 10k Tasks

psutil monitor (monitor.py)

Run in a separate terminal:

$ python monitor.py
import psutil
import time
p = psutil.Process()
while True:
    print(f"RSS: {p.memory_info().rss / 1024**2:.0f} MiB")
    time.sleep(10)
Version1k Tasks5k10kStable?
Leaky210 MB1.1 GB2.0 GBNo
Fixed210 MB211 MB210 MBYes

--maxtasksperchild=1000 as a temporary workaround.

Tip: Restarting every 1000 tasks bounds leaks but incurs process startup overhead—use while fixing root causes.

Production: Flower + Alerts

  • Flower: --events + custom mem plugin
  • Prometheus: celery_worker_memory_bytes
  • Restart: --maxtasksperchild=5000 + leak fixed

Conclusion

Memory leaks in long-running Celery workers can lead to production outages and performance degradation. Python’s tracemalloc module provides a powerful, dependency-free solution for identifying these leaks with precision.

Key Takeaways:

  • Use tracemalloc snapshots to compare memory usage before and after task execution
  • Implement signal handlers (like SIGUSR1) for production-safe memory dumps
  • Fix common patterns: replace global lists with collections.deque, use context managers for DB connections, and clear thread-local storage
  • Monitor continuously using tools like psutil and Flower to track RSS growth over time

Next Steps:

  • Implement the signal handler pattern in your Celery workers
  • Set up periodic memory profiling during development
  • Consider objgraph for detecting reference cycles if tracemalloc shows persistent leaks

The tools and techniques in this guide require no external dependencies and work with Python 3.9+. Start monitoring your workers today to catch memory issues before they become critical failures.

Related Resources:

Sponsored by Durable Programming

Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.

Hire Durable Programming