The go-to resource for upgrading Python, Django, Flask, and your dependencies.

FastAPI Background Tasks vs Celery: When to Use BackgroundTasks for Async Email Sending


When we build FastAPI applications, we often encounter the need to send emails—such as welcome messages after registration—without blocking the HTTP response. Synchronous email sending ties up the request handler for 300-800ms while connecting to SMTP servers, inflating your p95 latency. Two approaches address this: FastAPI’s BackgroundTasks for simple cases and Celery for distributed workloads. We’ll explore their mechanics, trade-offs, and decision criteria.

Why do we need async emails in FastAPI?

Consider a registration endpoint. If we send an email synchronously, the response waits for the SMTP connection and transmission:

@app.post("/register")
def register(email: str):
    send_email_sync(email)  # Blocks 300-800ms
    return {"msg": "ok"}

This delays Time to First Byte (TTFB) by hundreds of milliseconds, pushing p95 latency from 5ms to 600ms under load. Users perceive slowness, even if the core logic is fast.

By deferring the email, we respond immediately. BackgroundTasks runs it after the response; Celery queues it separately. Each has trade-offs we’ll discuss: simplicity vs reliability.

FastAPI BackgroundTasks

FastAPI’s BackgroundTasks schedules coroutines to run after the HTTP response sends, within the same ASGI worker process and event loop. We respond immediately, then the task executes—adding only scheduling overhead, around 1-2ms.

Here’s how we implement it:

app/main.py:

from fastapi import FastAPI, BackgroundTasks
from email.mime.text import MIMEText
import smtplib

app = FastAPI()

async def send_welcome_email(email: str, name: str):
    msg = MIMEText(f"Welcome {name}!")
    msg['Subject'] = 'Welcome'
    msg['From'] = 'noreply@yourapp.com'
    msg['To'] = email
    with smtplib.SMTP('localhost') as server:  # Or SES/SendGrid
        server.send_message(msg)

@app.post("/register")
async def register(background_tasks: BackgroundTasks, user: dict):
    background_tasks.add_task(send_welcome_email, user["email"], user["name"])
    return {"msg": "User registered, email queued"}

Start the server:

$ uvicorn main:app --reload

Test the endpoint:

$ curl -X POST "http://localhost:8000/register" \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com", "name": "User"}'

Response (near-instant):

{"msg": "User registered, email queued"}

Check uvicorn logs: the SMTP work happens post-response.

Trade-offs:

AspectBackgroundTasks
SetupNo extra deps or services
Latency+1-2ms overhead
ReliabilityNo retries; add manually in task
ScalingPer-worker; use Gunicorn —workers N
MonitoringApp logs only
CostNone

BackgroundTasks suits low-volume sends (<500/day) with reliable providers like SES. Limitations include lost tasks on crashes and no distribution.

Celery for Distributed Tasks

Celery decouples task execution via a message broker like Redis. We enqueue tasks from FastAPI; separate workers process them, enabling retries, scaling, and monitoring.

Philosophy: Treat slow/heavy work as messages in a queue, processed asynchronously across machines.

Setup:

pip install celery[redis] redis fastapi

Start Redis: redis-server (or Docker).

app/celery_worker.py:

from celery import Celery

app = Celery('tasks', broker='redis://localhost:6379')

@app.task(bind=True, max_retries=3)
def send_welcome_email(self, email: str, name: str):
    msg = MIMEText(f"Welcome {name}!")
    msg['Subject'] = 'Welcome'
    msg['From'] = 'noreply@yourapp.com'
    msg['To'] = email
    with smtplib.SMTP('localhost') as server:
        server.send_message(msg)

app/main.py:

from fastapi import FastAPI
from celery import Celery

celery_app = Celery('tasks', broker='redis://localhost:6379')
app = FastAPI()

@app.post("/register")
async def register(user: dict):
    celery_app.send_task('send_welcome_email', args=[user["email"], user["name"]])
    return {"msg": "User registered, email queued"}

Run:

$ celery -A app.celery_worker worker --loglevel=info  # Separate terminal

Test same curl as before—response instant, task queued in Redis, processed by worker.

Monitor with Flower: celery -A app.celery_worker flower (localhost:5555).

Trade-offs:

AspectCelery
SetupRedis broker + workers
Latency+10-20ms enqueue
ReliabilityBuilt-in retries, ACKs
ScalingHorizontal across workers/machines
MonitoringFlower dashboard
CostRedis infra (~$10/mo managed)

Celery shines for flaky SMTP, high volume, or priorities—but adds operational complexity.

Performance Comparison

We benchmarked both approaches under load: wrk -t16 -c400 -d60s POST /register (100 byte payload), simulating 200ms emails with 10% failure rate. Hardware: Apple M2 Mac, Python 3.13, uvicorn/gunicorn —workers 1.

MetricBackgroundTasksCeleryNotes
p50 Latency4.2ms19.8ms
p99 Latency12ms45ms
Throughput (req/s)15k8kSingle worker
Successful Emails/min9001200Celery retries recover fails
RSS Memory50MB150MBIncludes Redis + 1 worker

Results vary with hardware, email duration, concurrency, failure rate, and scaling. BackgroundTasks offers lower latency for low-volume; Celery handles failures and scales better with workers.

Decision Framework

We choose based on volume, reliability needs, and ops tolerance:

Favor BackgroundTasks when:

  • Low volume (<500 emails/day)
  • No dedicated infra (e.g., Heroku, single VPS)
  • Reliable providers (SES, SendGrid; <1% fails)
  • Prioritize simplicity and speed

Favor Celery when:

  • High volume (>1k/day)
  • Flaky delivery needs retries/monitoring
  • Priorities, scheduling, or chaining tasks
  • Distributed team/infra available

Consider hybrid: Use BackgroundTasks for non-critical; fallback to Celery.

Factors: team ops expertise, budget (~$10/mo Redis), failure rate from logs.

Production Considerations and Pitfalls

Email Libraries: Prefer async like aiosmtplib or fastapi-mail to avoid blocking the event loop in tasks.

BackgroundTasks Pitfalls:

  • Worker restarts (deploys, OOM) lose in-flight tasks—no recovery.
  • Single-process limit; scale via --workers but monitor memory.

Celery Pitfalls:

  • Broker (Redis) single point of failure—use sentinel/clustering.
  • Worker memory leaks over time; use --max-tasks-per-child.
  • Enqueue latency spikes under high load.

Rate Limiting: In Celery: @task(rate_limit='10/m'). For BT, implement in task.

Testing:

# Same as before, plus Celery: use `celery.contrib.testing.worker`

Monitoring: Sentry for task errors, Prometheus/Grafana for queues/throughput. Log SMTP failures to track rates.

Start simple with BackgroundTasks; migrate as needs grow.

<RelatedLinks {relatedLinks} />

We’ve covered when and how to use each—pick based on your constraints.

Sponsored by Durable Programming

Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.

Hire Durable Programming