| Crates.io | tokio-rate-limit |
| lib.rs | tokio-rate-limit |
| version | 0.8.0 |
| created_at | 2025-11-05 09:12:43.093329+00 |
| updated_at | 2025-11-10 00:36:47.892677+00 |
| description | High-performance, lock-free rate limiting library with pluggable algorithms and Axum middleware |
| homepage | |
| repository | https://github.com/danielrcurtis/tokio-rate-limit |
| max_upload_size | |
| id | 1917629 |
| size | 844,358 |
High-performance rate limiting library for Rust with lock-free token accounting, lock-free concurrent hashmap for per-key state, pluggable algorithms, and Axum middleware support.
Performance: 20.5M ops/sec single-threaded (v0.7.0 probabilistic) | 17.5M ops/sec deterministic (v0.8.0) | Multi-threaded +17% improvement | Sub-microsecond P99 latency
Most Rust rate limiting libraries (like governor) are optimized for global rate limiting - applying a single limit across all requests. This works great for simple "API allows 1000 requests/sec total" scenarios.
But what if you need per-client rate limits? Different limits for each user, IP address, or API key?
That's where tokio-rate-limit shines:
Use Cases:
RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) (NEW in v0.2.0)acquire() and acquire_timeout() (NEW in v0.2.0)v0.8.0 maintains excellent performance with Axum 0.8.6 support!
Benchmarks on Apple M1 Pro using flurry's lock-free HashMap with tokio 1.40:
| Configuration | Latency | Throughput | Notes |
|---|---|---|---|
| Single-threaded | 57ns | 17.5M ops/sec | Baseline with micro-sharding |
| 2 threads | 118ns | 8.5M ops/sec | Excellent multi-threaded scaling |
| 4 threads | 134ns | 7.5M ops/sec | Real-world web server performance |
| 8 threads | 213ns | 4.7M ops/sec | +17% vs v0.7.2 - Production optimized |
Micro-Sharding Architecture:
For ultra-high throughput scenarios where 1-2% error margin is acceptable:
| Configuration | Latency | Throughput | Improvement |
|---|---|---|---|
| Single-threaded (5% sampling) | 49ns | 20.5M ops/sec | +11.4% |
| 8 threads (5% sampling) | 196ns | 5.1M ops/sec | +24.6% |
| Cost-based (1% sampling) | 48ns | 21.0M ops/sec | +29.6% |
ProbabilisticTokenBucket (NEW in v0.7.0):
When to use Probabilistic:
Algorithm Comparison (v0.8.0):
Observability Overhead (Optional Features):
See BENCHMARK_COMPARISON_v0.5.0.md for detailed analysis across versions.
Key Insight: This library excels at per-key rate limiting (separate limits per client), while libraries like governor are optimized for global rate limiting (single limit for all requests). Both have their use cases, and this library fills the per-key niche with excellent performance.
┌─────────────────────────────────────────────────────────────┐
│ tokio-rate-limit │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ RateLimiter API │ │
│ │ check() | check_with_cost() | acquire() │ │
│ └────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌────────────────────▼──────────────────────────────────┐ │
│ │ Algorithm Trait │ │
│ │ (Pluggable, Token Bucket default) │ │
│ └────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌────────────────────▼──────────────────────────────────┐ │
│ │ flurry::HashMap<Key, TokenBucket> │ │
│ │ (Lock-free concurrent hashmap) │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Bucket │ │ Bucket │ │ Bucket │ ... │ │
│ │ │ "ip1" │ │ "user2" │ │ "key3" │ │ │
│ │ │ tokens: │ │ tokens: │ │ tokens: │ │ │
│ │ │ AtomicU64│ │ AtomicU64│ │ AtomicU64│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ │ Each bucket: atomic CAS operations │ │
│ │ Zero locks, zero contention │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Optional: TTL-based eviction (1% probabilistic cleanup) │
│ Optional: Tracing spans & metrics │
└───────────────────────────────────────────────────────────────┘
Request Flow (Sub-microsecond):
Total: ~45-65ns for in-memory permission check.
Add to your Cargo.toml:
[dependencies]
tokio-rate-limit = "0.8"
# For Axum middleware support
tokio-rate-limit = { version = "0.8", features = ["middleware"] }
# For Tonic gRPC middleware support
tokio-rate-limit = { version = "0.8", features = ["tonic-support"] }
# For observability (tracing + metrics)
tokio-rate-limit = { version = "0.8", features = ["middleware", "observability"] }
tokio-rate-limit = { version = "0.8", features = ["middleware", "metrics-support"] }
use tokio_rate_limit::RateLimiter;
#[tokio::main]
async fn main() {
// Create a rate limiter: 100 requests/second, burst of 200
let limiter = RateLimiter::builder()
.requests_per_second(100)
.burst(200)
.build()
.unwrap();
// Check if a request should be allowed
let decision = limiter.check("client-123").await.unwrap();
if decision.permitted {
// Process request
println!("Request allowed! Remaining: {}", decision.remaining.unwrap());
println!("Reset in: {:?}", decision.reset.unwrap());
} else {
// Rate limit exceeded
println!("Rate limited! Retry after: {:?}", decision.retry_after.unwrap());
}
}
For ultra-high throughput scenarios where 1-2% error margin is acceptable:
use tokio_rate_limit::algorithm::ProbabilisticTokenBucket;
use tokio_rate_limit::RateLimiter;
#[tokio::main]
async fn main() {
// Create probabilistic algorithm with 5% sampling (recommended)
let algorithm = ProbabilisticTokenBucket::new(
100, // capacity
100, // refill_rate per second
20 // sample_rate (5% = 1 in 20 requests)
);
let limiter = RateLimiter::from_algorithm(algorithm);
// Use exactly like regular TokenBucket
let decision = limiter.check("user-123").await.unwrap();
if decision.permitted {
println!("Request allowed! (probabilistic sampling)");
// 24.6% faster at 8 threads, <1% error margin
}
}
Recommended Configuration: 5% Sampling
Sampling Rate Guide:
When to use:
When NOT to use:
See PROBABILISTIC_ANALYSIS.md for comprehensive benchmarks and examples/probabilistic_rate_limiting.rs for production examples.
Assign different costs to different operations:
let limiter = RateLimiter::builder()
.requests_per_second(100)
.burst(200)
.build()
.unwrap();
// Light operation - costs 1 token
limiter.check_with_cost("user-123", 1).await?;
// Heavy operation - costs 50 tokens
limiter.check_with_cost("user-123", 50).await?;
// Use cases:
// - Simple queries: cost=1, Complex queries: cost=10
// - Small uploads: cost=1, Large uploads: cost=100
// - Fast API calls: cost=1, Expensive AI inference: cost=50
Wait for tokens to become available:
// Block indefinitely until tokens available
let decision = limiter.acquire("user-123").await?;
// Block with timeout
use std::time::Duration;
let decision = limiter.acquire_timeout("user-123", Duration::from_secs(5)).await?;
if !decision.permitted {
println!("Timed out waiting for tokens");
}
// Non-blocking (original behavior)
let decision = limiter.try_acquire("user-123").await?;
use axum::{Router, routing::get};
use tokio_rate_limit::{RateLimiter, middleware::RateLimitLayer};
use std::sync::Arc;
#[tokio::main]
async fn main() {
let limiter = Arc::new(
RateLimiter::builder()
.requests_per_second(100)
.burst(200)
.build()
.unwrap()
);
let app: Router = Router::new()
.route("/api/data", get(handler))
// Apply rate limiting to all routes (IP-based by default)
.layer(RateLimitLayer::new(limiter));
let listener = tokio::net::TcpListener::bind("127.0.0.1:3000")
.await
.unwrap();
axum::serve(
listener,
app.into_make_service_with_connect_info::<std::net::SocketAddr>(),
)
.await
.unwrap();
}
async fn handler() -> &'static str {
"Hello, World!"
}
Response Headers (IETF RFC Standards):
When rate limit is applied, responses include:
RateLimit-Limit: 100
RateLimit-Remaining: 42
RateLimit-Reset: 3
X-RateLimit-Limit: 100 # Legacy header (backward compat)
X-RateLimit-Remaining: 42 # Legacy header (backward compat)
When rate limit is exceeded (HTTP 429):
HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 8
Retry-After: 8
Rate limit by user ID, API key, or any custom logic:
use tokio_rate_limit::middleware::{RateLimitLayer, CustomKeyExtractor};
use axum::{body::Body, extract::Request};
let limiter = Arc::new(
RateLimiter::builder()
.requests_per_second(50)
.burst(100)
.build()
.unwrap()
);
// Extract user ID from header
let app: Router = Router::new()
.route("/api/user-data", get(handler))
.layer(RateLimitLayer::with_extractor(
limiter,
CustomKeyExtractor::new(|req: &Request<Body>| {
req.headers()
.get("X-User-Id")
.and_then(|v| v.to_str().ok())
.map(|s| s.to_string())
}),
));
Rate limit gRPC services with native Tonic integration:
use tokio_rate_limit::{RateLimiter, tonic_middleware::GrpcRateLimitLayer};
use tonic::transport::Server;
use std::sync::Arc;
let limiter = Arc::new(
RateLimiter::builder()
.requests_per_second(100)
.burst(200)
.build()?
);
Server::builder()
.layer(GrpcRateLimitLayer::new(limiter))
.add_service(GreeterServer::new(greeter))
.serve("[::1]:50051".parse()?)
.await?;
Key Extraction Strategies:
// Per-method (default) - different methods have independent limits
GrpcRateLimitLayer::new(limiter)
// Per-user (from metadata) - extract from gRPC metadata
use tokio_rate_limit::tonic_middleware::MetadataKeyExtractor;
GrpcRateLimitLayer::with_extractor(
limiter,
MetadataKeyExtractor::new("user-id")
)
// Per-IP - rate limit by client IP address
use tokio_rate_limit::tonic_middleware::IpKeyExtractor;
GrpcRateLimitLayer::with_extractor(limiter, IpKeyExtractor)
// Custom - implement your own logic
use tokio_rate_limit::tonic_middleware::CustomGrpcKeyExtractor;
GrpcRateLimitLayer::with_extractor(
limiter,
CustomGrpcKeyExtractor::new(|req| {
Some(format!("custom:{}", req.uri().path()))
})
)
Features:
Enable with feature flag:
tokio-rate-limit = { version = "0.5", features = ["tonic-support"] }
This library provides two rate limiting algorithms with different characteristics:
The token bucket algorithm allows bursts up to capacity and refills at a constant rate.
Characteristics:
Best For:
Example:
use tokio_rate_limit::RateLimiter;
// Token bucket: 100/sec rate, burst of 200
let limiter = RateLimiter::builder()
.requests_per_second(100)
.burst(200)
.build()
.unwrap();
The leaky bucket algorithm enforces a steady rate by "leaking" tokens at a constant rate.
Characteristics:
Best For:
Example:
use tokio_rate_limit::RateLimiter;
use tokio_rate_limit::algorithm::LeakyBucket;
// Leaky bucket: capacity 50, leak rate 100/sec
let algorithm = LeakyBucket::new(50, 100);
let limiter = RateLimiter::from_algorithm(algorithm);
| Feature | Token Bucket | Leaky Bucket |
|---|---|---|
| Bursts | ✅ Allowed (up to capacity) | ❌ Not allowed |
| Rate Enforcement | Average over time | Strict steady rate |
| Traffic Pattern | Bursty | Smooth |
| Best For | Public APIs, users | Backend protection |
| Predictability | Moderate | High |
| Performance | 17.5M ops/sec (v0.8.0) | 15.9M ops/sec (v0.8.0) |
Both algorithms are virtually identical in performance (within 3-7%). See ALGORITHM_BENCHMARKS.md for detailed benchmark results.
When to Choose:
See examples/leaky_bucket.rs for a detailed comparison with examples.
By default, token buckets are created on-demand and persist indefinitely. For high-cardinality keys (e.g., per-IP limits with millions of IPs), use TTL-based eviction:
use std::time::Duration;
use tokio_rate_limit::algorithm::TokenBucket;
// Evict idle buckets after 1 hour
let algorithm = TokenBucket::with_ttl(
200, // capacity
100, // refill rate per second
Duration::from_secs(3600) // TTL
);
let limiter = RateLimiter::from_algorithm(algorithm);
How it works:
Guidance:
Enable distributed tracing and metrics for production debugging:
# Cargo.toml
tokio-rate-limit = { version = "0.2", features = ["middleware", "observability"] }
# For metrics collection
tokio-rate-limit = { version = "0.2", features = ["middleware", "metrics-support"] }
When observability feature is enabled, all rate limit checks create trace spans:
use tracing_subscriber;
// Configure tracing subscriber (once at startup)
tracing_subscriber::fmt::init();
// All rate limit checks now emit spans
let decision = limiter.check("user-123").await?;
// Span includes:
// - key: "user-123"
// - permitted: true/false
// - remaining: token count
// - latency: nanoseconds
Trace Output Example:
DEBUG tokio_rate_limit::limiter: Rate limit check: PERMITTED key="user-123" remaining=199
When metrics-support feature is enabled:
// Metrics automatically recorded:
// - tokio_rate_limit.requests.allowed (counter)
// - tokio_rate_limit.requests.denied (counter)
// - tokio_rate_limit.remaining_tokens (histogram)
// Use any metrics backend (Prometheus, StatsD, etc.)
use metrics_exporter_prometheus::PrometheusBuilder;
PrometheusBuilder::new()
.install()
.expect("Failed to install Prometheus exporter");
Performance Impact:
See OBSERVABILITY.md for comprehensive integration guide with OpenTelemetry, Jaeger, Prometheus, and production best practices.
See the examples/ directory for complete working examples:
basic.rs - Direct usage without middlewareaxum_middleware.rs - IP-based rate limiting with Axumcustom_key_extraction.rs - User ID and API key rate limitingcost_based_limiting.rs - Weighted operations (NEW in v0.2.0)blocking_acquire.rs - Wait patterns (NEW in v0.2.0)leaky_bucket.rs - Algorithm comparison: token vs leaky bucket (NEW in v0.3.0)Run examples:
# Basic usage
cargo run --example basic
# Axum middleware with IP-based rate limiting
cargo run --example axum_middleware --features middleware
# Custom key extraction (user ID, API key)
cargo run --example custom_key_extraction --features middleware
# Cost-based limiting
cargo run --example cost_based_limiting
# Blocking acquire patterns
cargo run --example blocking_acquire
# Leaky bucket algorithm comparison
cargo run --example leaky_bucket
The library uses a token bucket algorithm for rate limiting:
When a request arrives:
cost tokens via compare-and-swap (lock-free)The entire hot path is lock-free, using atomic operations for both token accounting and key access.
As of v0.2.0, the library uses flurry's lock-free concurrent hashmap which automatically tunes its internal parameters for optimal performance across different workloads and thread counts. No manual tuning is required.
Performance improvements in v0.2.0:
The with_shard_count() method is now deprecated and internally calls the standard constructor, as flurry does not expose shard configuration.
| Feature | tokio-rate-limit | governor |
|---|---|---|
| Use Case | Per-key rate limiting | Global rate limiting |
| Performance | 20.5M ops/sec probabilistic / 17.5M deterministic (v0.8.0) | 357M ops/sec (global) |
| Key Management | Built-in per-key tracking | Manual key management |
| Middleware | Axum integration included | DIY middleware |
| Algorithm | Pluggable (token bucket default) | GCRA algorithm |
| Standards | IETF RateLimit headers | Custom headers |
| Cost-Based | ✅ Built-in | ❌ Not supported |
| Observability | ✅ Optional tracing/metrics | ❌ Manual |
When to use tokio-rate-limit:
When to use governor:
Both libraries are excellent choices depending on your use case!
middleware - Enables Axum middleware support (adds axum and tower dependencies)tonic-support - Enables Tonic gRPC middleware support (adds tonic, tower, http dependencies) (NEW in v0.5.0)observability - Enables distributed tracing via tracing crate (NEW in v0.2.0)metrics-support - Enables metrics collection via metrics crate (implies observability) (NEW in v0.2.0)Full API documentation is available at docs.rs/tokio-rate-limit.
Previous Releases:
See CHANGELOG.md for complete release history and PROBABILISTIC_ANALYSIS.md for comprehensive v0.7.0 benchmarks.
This crate requires Rust 1.75.0 or later.
Arc)TokenBucket::with_ttl() when you have millions of unique keys# Run all tests
cargo test
# Run tests with all features
cargo test --all-features
# Run benchmarks
cargo bench
# Run specific benchmark
cargo bench --bench rate_limit_performance
# Run examples
cargo run --example basic
cargo run --example axum_middleware --features middleware
cargo run --example cost_based_limiting
Contributions are welcome! Please feel free to submit a Pull Request.
Licensed under either of:
at your option.