The go-to resource for upgrading Python, Django, Flask, and your dependencies.

How to Profile Flask Applications with py-spy Without Adding Code Instrumentation


py-spy Flask profiler no instrumentation: Flask app under load (200req/s) → high CPU? Traditional cProfile/decorator slows prod. py-spy: Zero code changes, attach to live PID (ps aux | grep gunicorn), py-spy top live view (90% bottleneck in numpy loop), py-spy record -o flame.svg. Fix: vectorize → 5x speedup (1000req/s). Targets: “profile flask without code changes”, “py-spy flask gunicorn”, “flask cpu profiling live”.

Why py-spy for Flask? (Prod-Safe Sampling)

cProfile: Insert @profile → prod pollution. py-spy: OS signals (sampling 10ms), low overhead (<3% CPU), multi-thread (Gunicorn workers).

ProfilerCode ChangeOverheadLive AttachFlamegraph
cProfileYes20-50%NoManual
pyinstrumentDecorator10%NoYes
py-spyNo<3%YesBuilt-in
ScaleneNo5%YesYes

Install py-spy (Rust Binary, 10s)

# Recommended: Cargo (fastest)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install py-spy

# Or pip (wheels)
pip install py-spy

Verify: py-spy --version

Sample Slow Flask App (Gunicorn)

app.py (CPU bottleneck: naive loop):

from flask import Flask
import time
import numpy as np  # Sim heavy compute

app = Flask(__name__)

@app.route('/slow')
def slow_compute():
    result = 0
    for i in range(10**6):  # 100ms+ loop
        result += np.sin(i) * np.cos(i)
    return {'result': result}

@app.route('/health')
def health():
    return {'status': 'ok'}

if __name__ == '__main__':
    app.run()

gunicorn.conf.py:

bind = "0.0.0.0:5000"
workers = 4

Run: gunicorn -c gunicorn.conf.py app:app

Load test: locust -f locustfile.py --headless -u 50 -r 10 (or ab -n 1000 -c 50 http://localhost:5000/slow)

Step 1: Find Flask PID

ps aux | grep '[g]unicorn'
# Or: pgrep -f 'gunicorn.*app:app'
# PID 12345

Step 2: Live Top (Instant Insights)

py-spy top --pid 12345 --sort cpu

Output:

PID    CPU%   COMMAND       slow_compute()    92.3%
12345  92.3%  gunicorn      np.sin()          45.1%
12345  45.1%  gunicorn      for loop          40.2%

Bottleneck: 92% in /slow loop.

Step 3: Flamegraph (Shareable SVG)

py-spy record --pid 12345 -d 30 -o flask-profile.svg --subprocesses --rate 1000

30s sample → flask-profile.svg (open browser). Wide bars = hotspots.

Fix: Vectorize Loop (5x Speedup)

Before: 200req/s

for i in range(10**6):
    result += np.sin(i) * np.cos(i)

After:

i = np.arange(10**6)
result = np.sum(np.sin(i) * np.cos(i))

Reload Gunicorn: kill -HUP 12345

Benchmarks (M2 Mac, 4 Workers)

Endpointreq/s Beforereq/s AfterCPU% Drop
/slow loop210105092→18
/health500050000

wrk -t12 -c400 -d30s http://localhost:5000/slow

Pitfalls & Fixes

IssueCauseFix
Permission deniedNon-root PIDsudo py-spy or setcap
No flamegraphOld py-spycargo update py-spy
Threads missingPython GIL--native (C exts)
Uvicorn/FastAPIASGIWorks same PID
DockerHost networkdocker exec PID

Checklist

  • py-spy top --pid PID: ID bottleneck func
  • py-spy record -o svg: Export/share
  • Fix → kill -HUP reload
  • Retest load: wrk
  • No code changes!

py-spy → Profile any Flask prod today: 5min setup, instant wins.

Sponsored by Durable Programming

Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.

Hire Durable Programming