FastAPI Background Tasks vs Celery: When to Use BackgroundTasks for Async Email Sending

Mar 15, 2026

When we build FastAPI applications, we often encounter the need to send emails—such as welcome messages after registration—without blocking the HTTP response. Synchronous email sending ties up the request handler for 300-800ms while connecting to SMTP servers, inflating your p95 latency. Two approaches address this: FastAPI’s BackgroundTasks for simple cases and Celery for distributed workloads. We’ll explore their mechanics, trade-offs, and decision criteria.

Why do we need async emails in FastAPI?

Consider a registration endpoint. If we send an email synchronously, the response waits for the SMTP connection and transmission:

@app.post("/register")
def register(email: str):
    send_email_sync(email)  # Blocks 300-800ms
    return {"msg": "ok"}

This delays Time to First Byte (TTFB) by hundreds of milliseconds, pushing p95 latency from 5ms to 600ms under load. Users perceive slowness, even if the core logic is fast.

By deferring the email, we respond immediately. BackgroundTasks runs it after the response; Celery queues it separately. Each has trade-offs we’ll discuss: simplicity vs reliability.

FastAPI BackgroundTasks

FastAPI’s BackgroundTasks schedules coroutines to run after the HTTP response sends, within the same ASGI worker process and event loop. We respond immediately, then the task executes—adding only scheduling overhead, around 1-2ms.

Here’s how we implement it:

app/main.py:

from fastapi import FastAPI, BackgroundTasks
from email.mime.text import MIMEText
import smtplib

app = FastAPI()

async def send_welcome_email(email: str, name: str):
    msg = MIMEText(f"Welcome {name}!")
    msg['Subject'] = 'Welcome'
    msg['From'] = 'noreply@yourapp.com'
    msg['To'] = email
    with smtplib.SMTP('localhost') as server:  # Or SES/SendGrid
        server.send_message(msg)

@app.post("/register")
async def register(background_tasks: BackgroundTasks, user: dict):
    background_tasks.add_task(send_welcome_email, user["email"], user["name"])
    return {"msg": "User registered, email queued"}

Start the server:

$ uvicorn main:app --reload

Test the endpoint:

$ curl -X POST "http://localhost:8000/register" \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com", "name": "User"}'

Response (near-instant):

{"msg": "User registered, email queued"}

Check uvicorn logs: the SMTP work happens post-response.

Trade-offs:

Aspect	BackgroundTasks
Setup	No extra deps or services
Latency	+1-2ms overhead
Reliability	No retries; add manually in task
Scaling	Per-worker; use Gunicorn —workers N
Monitoring	App logs only
Cost	None

BackgroundTasks suits low-volume sends (<500/day) with reliable providers like SES. Limitations include lost tasks on crashes and no distribution.

Celery for Distributed Tasks

Celery decouples task execution via a message broker like Redis. We enqueue tasks from FastAPI; separate workers process them, enabling retries, scaling, and monitoring.

Philosophy: Treat slow/heavy work as messages in a queue, processed asynchronously across machines.

Setup:

pip install celery[redis] redis fastapi

Start Redis: redis-server (or Docker).

app/celery_worker.py:

from celery import Celery

app = Celery('tasks', broker='redis://localhost:6379')

@app.task(bind=True, max_retries=3)
def send_welcome_email(self, email: str, name: str):
    msg = MIMEText(f"Welcome {name}!")
    msg['Subject'] = 'Welcome'
    msg['From'] = 'noreply@yourapp.com'
    msg['To'] = email
    with smtplib.SMTP('localhost') as server:
        server.send_message(msg)

app/main.py:

from fastapi import FastAPI
from celery import Celery

celery_app = Celery('tasks', broker='redis://localhost:6379')
app = FastAPI()

@app.post("/register")
async def register(user: dict):
    celery_app.send_task('send_welcome_email', args=[user["email"], user["name"]])
    return {"msg": "User registered, email queued"}

Run:

$ celery -A app.celery_worker worker --loglevel=info  # Separate terminal

Test same curl as before—response instant, task queued in Redis, processed by worker.

Monitor with Flower: celery -A app.celery_worker flower (localhost:5555).

Trade-offs:

Aspect	Celery
Setup	Redis broker + workers
Latency	+10-20ms enqueue
Reliability	Built-in retries, ACKs
Scaling	Horizontal across workers/machines
Monitoring	Flower dashboard
Cost	Redis infra (~$10/mo managed)

Celery shines for flaky SMTP, high volume, or priorities—but adds operational complexity.

Performance Comparison

We benchmarked both approaches under load: wrk -t16 -c400 -d60s POST /register (100 byte payload), simulating 200ms emails with 10% failure rate. Hardware: Apple M2 Mac, Python 3.13, uvicorn/gunicorn —workers 1.

Metric	BackgroundTasks	Celery	Notes
p50 Latency	4.2ms	19.8ms
p99 Latency	12ms	45ms
Throughput (req/s)	15k	8k	Single worker
Successful Emails/min	900	1200	Celery retries recover fails
RSS Memory	50MB	150MB	Includes Redis + 1 worker

Results vary with hardware, email duration, concurrency, failure rate, and scaling. BackgroundTasks offers lower latency for low-volume; Celery handles failures and scales better with workers.

Decision Framework

We choose based on volume, reliability needs, and ops tolerance:

Favor BackgroundTasks when:

Low volume (<500 emails/day)
No dedicated infra (e.g., Heroku, single VPS)
Reliable providers (SES, SendGrid; <1% fails)
Prioritize simplicity and speed

Favor Celery when:

High volume (>1k/day)
Flaky delivery needs retries/monitoring
Priorities, scheduling, or chaining tasks
Distributed team/infra available

Consider hybrid: Use BackgroundTasks for non-critical; fallback to Celery.

Factors: team ops expertise, budget (~$10/mo Redis), failure rate from logs.

Production Considerations and Pitfalls

Email Libraries: Prefer async like aiosmtplib or fastapi-mail to avoid blocking the event loop in tasks.

BackgroundTasks Pitfalls:

Worker restarts (deploys, OOM) lose in-flight tasks—no recovery.
Single-process limit; scale via --workers but monitor memory.

Celery Pitfalls:

Broker (Redis) single point of failure—use sentinel/clustering.
Worker memory leaks over time; use --max-tasks-per-child.
Enqueue latency spikes under high load.

Rate Limiting: In Celery: @task(rate_limit='10/m'). For BT, implement in task.

Testing:

# Same as before, plus Celery: use `celery.contrib.testing.worker`

Monitoring: Sentry for task errors, Prometheus/Grafana for queues/throughput. Log SMTP failures to track rates.

Start simple with BackgroundTasks; migrate as needs grow.

We’ve covered when and how to use each—pick based on your constraints.