| Crates.io | gunicorn-autoscaler |
| lib.rs | gunicorn-autoscaler |
| version | 0.2.1 |
| created_at | 2025-12-06 01:08:34.702896+00 |
| updated_at | 2025-12-06 01:09:43.803801+00 |
| description | Gunicorn autoscaling wrapper (dynamic workers via StatsD + TTIN/TTOU) |
| homepage | |
| repository | https://github.com/Grail-Computer/Gunicorn-Autoscaler.git |
| max_upload_size | |
| id | 1969487 |
| size | 37,499 |
gunicorn-autoscaler is a lightweight Rust wrapper for Gunicorn that provides autoscaling capabilities for FastAPI web applications.
It manages Gunicorn processes and listens to StatsD metrics to dynamicially add or remove workers based on real-time request pressure, without needing a full redeploy or complex orchestrator rules.
This tool was built to solve a specific problem on Railway (and similar PaaS providers):
[!WARNING] > Production Note: This tool was optimized for a specific single-node PaaS use case. While robust, it makes opinionated choices (like using signals for scaling). If you are on Kubernetes, horizontal pod autoscaling (HPA) is usually the preferred "cloud-native" scaling method. Use this if you need vertical autoscaling within a single container/node.
TTIN, TTOU) to manage workers.uvicorn.workers.UvicornWorker out of the box.Download the binary or build from source:
cargo install gunicorn-autoscaler
| Variable | Default | Description |
|---|---|---|
GUNICORN_AUTOSCALER_MIN_WORKERS |
2 |
Minimum workers to keep alive |
GUNICORN_AUTOSCALER_MAX_WORKERS |
2*cores + 1 |
Maximum workers cap |
GUNICORN_AUTOSCALER_SLO_P95_MS |
300 |
Target p95 latency in ms |
GUNICORN_AUTOSCALER_IDLE_SECONDS |
60 |
Seconds of low traffic before scaling down |
GUNICORN_AUTOSCALER_STATSD_ADDR |
127.0.0.1:9125 |
StatsD listener address |
Replace your standard gunicorn command with gunicorn-autoscaler:
gunicorn-autoscaler myapp:app --bind 0.0.0.0:8000 --worker-class uvicorn.workers.UvicornWorker
If you are using UvicornWorker, you must emit StatsD metrics from your application manually, because Uvicorn bypasses Gunicorn's metric tracking.
Add this middleware to your FastAPI/Starlette app:
import socket
import time
from fastapi import Request
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
start = time.time()
response = await call_next(request)
duration_ms = int((time.time() - start) * 1000)
# Emit metrics in Gunicorn format
try:
sock.sendto(b"gunicorn.requests:1|c", ("127.0.0.1", 9125))
sock.sendto(f"gunicorn.request.duration:{duration_ms}|ms".encode(), ("127.0.0.1", 9125))
except:
pass
return response
This repository includes a Docker-based integration test suite to verify autoscaling behavior (burst up and idle down).
uvThe test runner builds the container, runs both "burst" and "idle" scenarios, and verifies log output.
# Create venv and install dependency
uv venv
source .venv/bin/activate
uv pip install httpx
# Run full suite
python3 tests/run_tests.py