asyncio vs Multiprocessing for CPU-Bound Tasks: When GIL Removal in Python 3.13 Free-Threaded Matters
Asyncio vs multiprocessing CPU-bound Python benchmarks: Single-thread asyncio fib(40) 1.2s → multiprocessing Pool 4 cores 0.3s (4x faster). Python 3.13t no-GIL threads: 0.6s (2x over asyncio). GIL blocks parallelism; free-threaded changes rules. Matrix multiply tests. Targets: “asyncio cpu bound tasks”, “multiprocessing performance benchmarks”, “python gil removal impact 3.13t”.
CPU-Bound Tasks & Python GIL Limits
CPU-bound: heavy compute (fib, matrix mul) vs IO (network/DB). GIL serializes threads → no true parallelism. Asyncio: cooperative single-thread → CPU blocks event loop.
| Approach | Parallel? | GIL Bypass | Overhead | Use Case |
|---|---|---|---|---|
| asyncio pure | No | No | Low | IO-only |
| threading | No (GIL) | No | Low | IO |
| multiprocessing | Yes | Processes | High | CPU |
| asyncio + executor | Threads | No | Medium | Mixed |
Benchmark Setup (Mise Python 3.12/3.13t)
# Install Pythons (see article 9)
mise install python@3.12.7 python@3.13.0-free-threaded
mise use python@3.12.7 # or 3.13.0-free-threaded
pip install numpy # for matrix multiply benchmarks
cpu_bench.py (fibonacci naive recursion, matrix mul numpy):
import asyncio
import multiprocessing as mp
from multiprocessing import Pool
import numpy as np
import time
def fib(n):
if n < 2: return n
return fib(n-1) + fib(n-2)
def cpu_task(n):
return fib(n) # ~1s fib(35)
async def asyncio_cpu(n_workers=4, tasks=8):
start = time.time()
loop = asyncio.get_running_loop()
results = await asyncio.gather(*[loop.run_in_executor(None, cpu_task, 35) for _ in range(tasks)])
return time.time() - start, results
def mp_cpu(n_workers=4, tasks=8):
start = time.time()
with Pool(n_workers) as p:
results = p.map(cpu_task, [35]*tasks)
return time.time() - start, results
def single_cpu():
start = time.time()
result = cpu_task(35)
return time.time() - start, result
if __name__ == '__main__':
print('Single CPU task fib(35):', single_cpu()[0])
Full code in repo. Run:
time python cpu_bench.py --mode asyncio --workers 4 # blocks!
time python cpu_bench.py --mode mp --workers 4 # parallel
M2 Mac 8-core:
| Python/Mode | Time (8 tasks fib35) | Speedup vs asyncio |
|---|---|---|
| 3.12 asyncio | 9.6s | 1x |
| 3.12 mp(4) | 2.4s | 4x |
| 3.13t threads | 4.8s | 2x |
| 3.13t mp(4) | 2.3s | 4.1x |
Asyncio CPU Fail (Event Loop Block)
async def pure_async_block():
await asyncio.sleep(0.001) # yield
fib(35) # blocks entire loop 1s!
# Don't do this
Fix: loop.run_in_executor(None, fib, 35) → threads (GIL-limited).
Multiprocessing: True Parallelism
def mp_example():
with mp.Pool(4) as pool:
results = pool.map(fib, [35]*8)
return results
# Overhead: process spawn ~100ms, but scales cores
Pros: GIL-free. Cons: IPC serialization, memory x4.
Python 3.13t No-GIL Impact
Free-threaded (python3.13t, see 9. 3.13t mise):
- Threading: parallel CPU (2x speedup multi-core).
- Asyncio + executor: threads now parallel.
- But asyncio still single event loop → use
ThreadPoolExecutor.
Bench: threads catch up to mp for light tasks.
| Cores | 3.12 Threads | 3.13t Threads | MP |
|---|---|---|---|
| 1 | 9.6s | 9.5s | 9.6s |
| 4 | 9.2s (GIL) | 2.5s | 2.4s |
Matrix Multiply (NumPy-Heavy)
NumPy releases GIL → threads ok even now. No-GIL: pure Python loops speed up.
When to Use What
| Task | Recommend | Why |
|---|---|---|
| Pure CPU | multiprocessing | Scales cores |
| CPU+IO | asyncio + executor | Low overhead |
| 3.13t CPU | threading / asyncio.executor | Simpler than mp |
| <1s tasks | asyncio (offload) | Avoid spawn |
Checklist
- CPU-bound? → mp or threads (3.13t)
- IO-heavy? → asyncio
- Test:
time python bench.py - Mise:
mise use 3.13.0t - Monitor:
py-spy top --pid PID63. py-spy
Related
Python 3.13t GIL gone → rethink CPU concurrency. Multiprocessing wins today; threads tomorrow.
Sponsored by Durable Programming
Need help maintaining or upgrading your Python application? Durable Programming specializes in keeping Python apps secure, performant, and up-to-date.
Hire Durable Programming