FastAPI Background Tasks vs Celery: When to Use BackgroundTasks for Async Email Sending
When we build FastAPI applications, we often encounter the need to send emails—such as welcome messages after registration—without blocking the HTTP response. Synchronous email sending ties up the request handler for 300-800ms while connecting to SMTP servers, inflating your p95 latency. Two approaches address this: FastAPI’s BackgroundTasks for simple cases and Celery for distributed workloads. We’ll explore their mechanics, trade-offs, and decision criteria.
Why do we need async emails in FastAPI?
Consider a registration endpoint. If we send an email synchronously, the response waits for the SMTP connection and transmission:
@app.post("/register")
def register(email: str):
send_email_sync(email) # Blocks 300-800ms
return {"msg": "ok"}
This delays Time to First Byte (TTFB) by hundreds of milliseconds, pushing p95 latency from 5ms to 600ms under load. Users perceive slowness, even if the core logic is fast.
By deferring the email, we respond immediately. BackgroundTasks runs it after the response; Celery queues it separately. Each has trade-offs we’ll discuss: simplicity vs reliability.
FastAPI BackgroundTasks
FastAPI’s BackgroundTasks schedules coroutines to run after the HTTP response sends, within the same ASGI worker process and event loop. We respond immediately, then the task executes—adding only scheduling overhead, around 1-2ms.
Here’s how we implement it:
app/main.py:
from fastapi import FastAPI, BackgroundTasks
from email.mime.text import MIMEText
import smtplib
app = FastAPI()
async def send_welcome_email(email: str, name: str):
msg = MIMEText(f"Welcome {name}!")
msg['Subject'] = 'Welcome'
msg['From'] = 'noreply@yourapp.com'
msg['To'] = email
with smtplib.SMTP('localhost') as server: # Or SES/SendGrid
server.send_message(msg)
@app.post("/register")
async def register(background_tasks: BackgroundTasks, user: dict):
background_tasks.add_task(send_welcome_email, user["email"], user["name"])
return {"msg": "User registered, email queued"}
Start the server:
$ uvicorn main:app --reload
Test the endpoint:
$ curl -X POST "http://localhost:8000/register" \
-H "Content-Type: application/json" \
-d '{"email": "user@example.com", "name": "User"}'
Response (near-instant):
{"msg": "User registered, email queued"}
Check uvicorn logs: the SMTP work happens post-response.
Trade-offs:
| Aspect | BackgroundTasks |
|---|---|
| Setup | No extra deps or services |
| Latency | +1-2ms overhead |
| Reliability | No retries; add manually in task |
| Scaling | Per-worker; use Gunicorn —workers N |
| Monitoring | App logs only |
| Cost | None |
BackgroundTasks suits low-volume sends (<500/day) with reliable providers like SES. Limitations include lost tasks on crashes and no distribution.
Celery for Distributed Tasks
Celery decouples task execution via a message broker like Redis. We enqueue tasks from FastAPI; separate workers process them, enabling retries, scaling, and monitoring.
Philosophy: Treat slow/heavy work as messages in a queue, processed asynchronously across machines.
Setup:
pip install celery[redis] redis fastapi
Start Redis: redis-server (or Docker).
app/celery_worker.py:
from celery import Celery
app = Celery('tasks', broker='redis://localhost:6379')
@app.task(bind=True, max_retries=3)
def send_welcome_email(self, email: str, name: str):
msg = MIMEText(f"Welcome {name}!")
msg['Subject'] = 'Welcome'
msg['From'] = 'noreply@yourapp.com'
msg['To'] = email
with smtplib.SMTP('localhost') as server:
server.send_message(msg)
app/main.py:
from fastapi import FastAPI
from celery import Celery
celery_app = Celery('tasks', broker='redis://localhost:6379')
app = FastAPI()
@app.post("/register")
async def register(user: dict):
celery_app.send_task('send_welcome_email', args=[user["email"], user["name"]])
return {"msg": "User registered, email queued"}
Run:
$ celery -A app.celery_worker worker --loglevel=info # Separate terminal
Test same curl as before—response instant, task queued in Redis, processed by worker.
Monitor with Flower: celery -A app.celery_worker flower (localhost:5555).
Trade-offs:
| Aspect | Celery |
|---|---|
| Setup | Redis broker + workers |
| Latency | +10-20ms enqueue |
| Reliability | Built-in retries, ACKs |
| Scaling | Horizontal across workers/machines |
| Monitoring | Flower dashboard |
| Cost | Redis infra (~$10/mo managed) |
Celery shines for flaky SMTP, high volume, or priorities—but adds operational complexity.
Performance Comparison
We benchmarked both approaches under load: wrk -t16 -c400 -d60s POST /register (100 byte payload), simulating 200ms emails with 10% failure rate. Hardware: Apple M2 Mac, Python 3.13, uvicorn/gunicorn —workers 1.
| Metric | BackgroundTasks | Celery | Notes |
|---|---|---|---|
| p50 Latency | 4.2ms | 19.8ms | |
| p99 Latency | 12ms | 45ms | |
| Throughput (req/s) | 15k | 8k | Single worker |
| Successful Emails/min | 900 | 1200 | Celery retries recover fails |
| RSS Memory | 50MB | 150MB | Includes Redis + 1 worker |
Results vary with hardware, email duration, concurrency, failure rate, and scaling. BackgroundTasks offers lower latency for low-volume; Celery handles failures and scales better with workers.
Decision Framework
We choose based on volume, reliability needs, and ops tolerance:
Favor BackgroundTasks when:
- Low volume (<500 emails/day)
- No dedicated infra (e.g., Heroku, single VPS)
- Reliable providers (SES, SendGrid; <1% fails)
- Prioritize simplicity and speed
Favor Celery when:
- High volume (>1k/day)
- Flaky delivery needs retries/monitoring
- Priorities, scheduling, or chaining tasks
- Distributed team/infra available
Consider hybrid: Use BackgroundTasks for non-critical; fallback to Celery.
Factors: team ops expertise, budget (~$10/mo Redis), failure rate from logs.
Production Considerations and Pitfalls
Email Libraries: Prefer async like aiosmtplib or fastapi-mail to avoid blocking the event loop in tasks.
BackgroundTasks Pitfalls:
- Worker restarts (deploys, OOM) lose in-flight tasks—no recovery.
- Single-process limit; scale via
--workersbut monitor memory.
Celery Pitfalls:
- Broker (Redis) single point of failure—use sentinel/clustering.
- Worker memory leaks over time; use
--max-tasks-per-child. - Enqueue latency spikes under high load.
Rate Limiting: In Celery: @task(rate_limit='10/m'). For BT, implement in task.
Testing:
# Same as before, plus Celery: use `celery.contrib.testing.worker`
Monitoring: Sentry for task errors, Prometheus/Grafana for queues/throughput. Log SMTP failures to track rates.
Start simple with BackgroundTasks; migrate as needs grow.
<RelatedLinks {relatedLinks} />
We’ve covered when and how to use each—pick based on your constraints.
Sponsored by Durable Programming
Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.
Hire Durable Programming