gunicorn-autoscaler

Crates.iogunicorn-autoscaler
lib.rsgunicorn-autoscaler
version0.2.1
created_at2025-12-06 01:08:34.702896+00
updated_at2025-12-06 01:09:43.803801+00
descriptionGunicorn autoscaling wrapper (dynamic workers via StatsD + TTIN/TTOU)
homepage
repositoryhttps://github.com/Grail-Computer/Gunicorn-Autoscaler.git
max_upload_size
id1969487
size37,499
(yashagarwal1994)

documentation

README

Gunicorn-autoscaler

gunicorn-autoscaler is a lightweight Rust wrapper for Gunicorn that provides autoscaling capabilities for FastAPI web applications.

It manages Gunicorn processes and listens to StatsD metrics to dynamicially add or remove workers based on real-time request pressure, without needing a full redeploy or complex orchestrator rules.

Inspiration & Use Case

This tool was built to solve a specific problem on Railway (and similar PaaS providers):

  • Single-node scalability: I wanted a single service to handle variable load without paying for over-provisioned resources.
  • Cost efficiency: When traffic drops, the service should contract to minimal resources. When traffic spikes, it should instantly burst up.
  • Simplicity: No K8s HPA or complex external monitoring hooks—just a binary that watches metrics and manages the process.

[!WARNING] > Production Note: This tool was optimized for a specific single-node PaaS use case. While robust, it makes opinionated choices (like using signals for scaling). If you are on Kubernetes, horizontal pod autoscaling (HPA) is usually the preferred "cloud-native" scaling method. Use this if you need vertical autoscaling within a single container/node.

Features

  • Autoscaling: scales workers up/down based on RPS and Request Duration (p95).
  • Burst Mode: instantly adds workers during sudden traffic spikes.
  • Zero-downtime: uses standard Unix signals (TTIN, TTOU) to manage workers.
  • Single Binary: ships as a static Rust binary alongside your Python app.
  • Works with Uvicorn: supports uvicorn.workers.UvicornWorker out of the box.

Usage

1. Install

Download the binary or build from source:

cargo install gunicorn-autoscaler

2. Configure (Environment Variables)

Variable Default Description
GUNICORN_AUTOSCALER_MIN_WORKERS 2 Minimum workers to keep alive
GUNICORN_AUTOSCALER_MAX_WORKERS 2*cores + 1 Maximum workers cap
GUNICORN_AUTOSCALER_SLO_P95_MS 300 Target p95 latency in ms
GUNICORN_AUTOSCALER_IDLE_SECONDS 60 Seconds of low traffic before scaling down
GUNICORN_AUTOSCALER_STATSD_ADDR 127.0.0.1:9125 StatsD listener address

3. Run

Replace your standard gunicorn command with gunicorn-autoscaler:

gunicorn-autoscaler myapp:app --bind 0.0.0.0:8000 --worker-class uvicorn.workers.UvicornWorker

⚠️ Uvicorn & StatsD Requirement

If you are using UvicornWorker, you must emit StatsD metrics from your application manually, because Uvicorn bypasses Gunicorn's metric tracking.

Add this middleware to your FastAPI/Starlette app:

import socket
import time
from fastapi import Request

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    start = time.time()
    response = await call_next(request)
    duration_ms = int((time.time() - start) * 1000)

    # Emit metrics in Gunicorn format
    try:
        sock.sendto(b"gunicorn.requests:1|c", ("127.0.0.1", 9125))
        sock.sendto(f"gunicorn.request.duration:{duration_ms}|ms".encode(), ("127.0.0.1", 9125))
    except:
        pass

    return response

Development & Testing

This repository includes a Docker-based integration test suite to verify autoscaling behavior (burst up and idle down).

Prerequisites

  • Docker
  • Python 3 + uv

Running Tests

The test runner builds the container, runs both "burst" and "idle" scenarios, and verifies log output.

# Create venv and install dependency
uv venv
source .venv/bin/activate
uv pip install httpx

# Run full suite
python3 tests/run_tests.py
Commit count: 0

cargo fmt