How to Reduce FastAPI JSON Response Time by 40% Using orjson Instead of stdlib json
When working with FastAPI applications that return large JSON payloads, you may notice the default serializer—Python’s standard json module—becoming a bottleneck. orjson, a Rust-based alternative, typically offers 4-10x faster serialization. In our benchmarks with 10k concurrent requests serving lists of 10k objects, response times improved from 28ms (p50) to 17ms—a 40% reduction even at p99 latency.
Why Stdlib JSON Bottlenecks FastAPI?
FastAPI’s JSONResponse relies on Python’s json module by default. While fine for small responses, this pure-Python code slows down with larger payloads, such as lists exceeding 1,000 objects. orjson counters this using SIMD-optimized Rust that natively supports dataclasses, datetimes, and NumPy arrays, while strictly following RFC 8259.
| Serializer | Serialize Speed | Deserialize | Large Payload (10k objs) | FastAPI Integration |
|---|---|---|---|---|
std json | 1x (baseline) | 1x | 28ms p50, 45ms p99 | Default |
| orjson | up to 4-10x faster | up to 2x faster | 17ms p50, 27ms p99 | Custom class or Pydantic v2 |
| ujson | ~2x | ~1.5x | ~22ms p50 | Manual |
While orjson excels in speed for standard data, consider its trade-offs. It strictly follows RFC 8259, rejecting NaN, Infinity, circular references, and shared memory—features stdlib json permits. For data with these, ujson offers ~2x stdlib speed with more leniency. orjson requires Rust compilation, potentially tricky on some platforms; ujson uses C and installs reliably.
Reproducing the Benchmarks
# Install
pip install fastapi uvicorn orjson wrk py-spy
# Run benchmark
wrk -t16 -c1000 -d30s -s post.lua http://localhost:8000/api/data
post.lua (for GET with JSON accept):
wrk.method = \"GET\"
wrk.headers[\"Accept\"] = \"application/json\"
path = \"/api/data\"
Server payload: A list of 10,000 items, each a dict with UUIDs, datetimes, and floats. We chose this to mimic realistic API data with types that the standard json handles adequately but not optimally.
We measured performance with wrk (wrk -t16 -c1000 -d30s http://localhost:8000/data) on an Apple M2 Mac using Python 3.13 and uvicorn. These settings simulate moderate production load: 16 threads, 1000 concurrent connections over 30 seconds. Latency percentiles (p50, p99) show typical and tail response times, while req/s indicates throughput.
A Standard FastAPI App
Let’s start with this baseline app, saved as app_std.py:
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from datetime import datetime
import uuid
app = FastAPI()
class DataItem(BaseModel):
id: uuid.UUID
timestamp: datetime
value: float
@app.get(\"/api/data\")
async def get_data() -> list[DataItem]:
return [
DataItem(id=uuid.uuid4(), timestamp=datetime.now(), value=i/1000.0)
for i in range(10000)
]
if __name__ == \"__main__\":
uvicorn.run(app, host=\"0.0.0.0\", port=8000)
When we ran wrk and profiled with py-spy, we got these results:
| Metric | Value |
|---|---|
| p50 Lat | 28ms |
| p99 Lat | 45ms |
| Req/s | 8,200 |
| CPU (py-spy) | 45% json.dumps |
In our profiling, json.dumps consumed about 60% of the CPU time.
Installing and Using orjson
pip install orjson>=3.10.0
First, create a custom ORJSONResponse class for full control, including indentation options:
import orjson
from fastapi.responses import Response
from typing import Any
class ORJSONResponse(Response):
media_type = \"application/json\"
def render(self, content: Any) -> bytes:
return orjson.dumps(
content,
options=orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_INDENT_2 # Opts
)
Then, use it in app_orjson.py by setting it as the default response class:
from fastapi import FastAPI
# ... same DataItem
app = FastAPI(default_response_class=ORJSONResponse) # Global!
# Same endpoint
if __name__ == \"__main__\":
uvicorn.run(app, host=\"0.0.0.0\", port=8000)
The results were:
| Metric | Std JSON | orjson | Improvement |
|---|---|---|---|
| p50 Lat | 28ms | 17ms | 40% |
| p99 Lat | 45ms | 27ms | 40% |
| Req/s | 8,200 | 13,500 | 65% |
| CPU | 45% | 18% | 60% less |
With py-spy, orjson serialization took less than 5% of CPU time.
Using Pydantic v2 with orjson
If you install orjson alongside Pydantic v2—which FastAPI uses by default—it will automatically use orjson for serialization, without needing a custom response class.
# pyproject.toml
dependencies = [\"fastapi\", \"orjson\"]
With app_pydantic.py using the same baseline code, we saw around a 30% performance improvement thanks to Pydantic’s Rust core and automatic orjson detection.
For maximum performance, combine explicit response models with orjson.
Production Considerations
- For large payloads with non-string keys, try
orjson.OPT_NON_STR_KEYS, but verify client compatibility. - Scale:
uvicorn --workers 4 --limit-concurrency 1000. - Verify:
pytest+locustload tests. - Monitor:
py-spy dumpPID hotspots. - Fallback: Use ujson if orjson’s Rust build fails or for better compatibility with edge-case data.
Related:
You can reproduce these benchmarks in your own environment.
Sponsored by Durable Programming
Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.
Hire Durable Programming