How to Profile Flask Applications with py-spy Without Adding Code Instrumentation
py-spy Flask profiler no instrumentation: Flask app under load (200req/s) → high CPU? Traditional cProfile/decorator slows prod. py-spy: Zero code changes, attach to live PID (ps aux | grep gunicorn), py-spy top live view (90% bottleneck in numpy loop), py-spy record -o flame.svg. Fix: vectorize → 5x speedup (1000req/s). Targets: “profile flask without code changes”, “py-spy flask gunicorn”, “flask cpu profiling live”.
Why py-spy for Flask? (Prod-Safe Sampling)
cProfile: Insert @profile → prod pollution.
py-spy: OS signals (sampling 10ms), low overhead (<3% CPU), multi-thread (Gunicorn workers).
| Profiler | Code Change | Overhead | Live Attach | Flamegraph |
|---|---|---|---|---|
| cProfile | Yes | 20-50% | No | Manual |
| pyinstrument | Decorator | 10% | No | Yes |
| py-spy | No | <3% | Yes | Built-in |
| Scalene | No | 5% | Yes | Yes |
Install py-spy (Rust Binary, 10s)
# Recommended: Cargo (fastest)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install py-spy
# Or pip (wheels)
pip install py-spy
Verify: py-spy --version
Sample Slow Flask App (Gunicorn)
app.py (CPU bottleneck: naive loop):
from flask import Flask
import time
import numpy as np # Sim heavy compute
app = Flask(__name__)
@app.route('/slow')
def slow_compute():
result = 0
for i in range(10**6): # 100ms+ loop
result += np.sin(i) * np.cos(i)
return {'result': result}
@app.route('/health')
def health():
return {'status': 'ok'}
if __name__ == '__main__':
app.run()
gunicorn.conf.py:
bind = "0.0.0.0:5000"
workers = 4
Run: gunicorn -c gunicorn.conf.py app:app
Load test: locust -f locustfile.py --headless -u 50 -r 10 (or ab -n 1000 -c 50 http://localhost:5000/slow)
Step 1: Find Flask PID
ps aux | grep '[g]unicorn'
# Or: pgrep -f 'gunicorn.*app:app'
# PID 12345
Step 2: Live Top (Instant Insights)
py-spy top --pid 12345 --sort cpu
Output:
PID CPU% COMMAND slow_compute() 92.3%
12345 92.3% gunicorn np.sin() 45.1%
12345 45.1% gunicorn for loop 40.2%
Bottleneck: 92% in /slow loop.
Step 3: Flamegraph (Shareable SVG)
py-spy record --pid 12345 -d 30 -o flask-profile.svg --subprocesses --rate 1000
30s sample → flask-profile.svg (open browser). Wide bars = hotspots.
Fix: Vectorize Loop (5x Speedup)
Before: 200req/s
for i in range(10**6):
result += np.sin(i) * np.cos(i)
After:
i = np.arange(10**6)
result = np.sum(np.sin(i) * np.cos(i))
Reload Gunicorn: kill -HUP 12345
Benchmarks (M2 Mac, 4 Workers)
| Endpoint | req/s Before | req/s After | CPU% Drop |
|---|---|---|---|
| /slow loop | 210 | 1050 | 92→18 |
| /health | 5000 | 5000 | 0 |
wrk -t12 -c400 -d30s http://localhost:5000/slow
Pitfalls & Fixes
| Issue | Cause | Fix |
|---|---|---|
| Permission denied | Non-root PID | sudo py-spy or setcap |
| No flamegraph | Old py-spy | cargo update py-spy |
| Threads missing | Python GIL | --native (C exts) |
| Uvicorn/FastAPI | ASGI | Works same PID |
| Docker | Host network | docker exec PID |
Checklist
-
py-spy top --pid PID: ID bottleneck func -
py-spy record -o svg: Export/share - Fix →
kill -HUPreload - Retest load:
wrk - No code changes!
Related
py-spy → Profile any Flask prod today: 5min setup, instant wins.
Sponsored by Durable Programming
Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.
Hire Durable Programming