Crates.io | onwards |
lib.rs | onwards |
version | 0.7.1 |
created_at | 2025-07-31 07:04:09.76829+00 |
updated_at | 2025-09-25 12:54:21.057771+00 |
description | A flexible LLM proxy library |
homepage | https://github.com/doublewordai/onwards |
repository | https://github.com/doublewordai/onwards |
max_upload_size | |
id | 1774526 |
size | 180,091 |
A Rust-based AI Gateway that provides a unified interface for routing requests to openAI compatible targets. The goal is to be as 'transparent' as possible.
Create a config.json
file with your target configurations:
{
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"onwards_model": "gpt-4"
},
"claude-3": {
"url": "https://api.anthropic.com",
"onwards_key": "sk-ant-your-anthropic-key"
},
"local-model": {
"url": "http://localhost:8080"
}
}
}
Start the gateway:
cargo run -- -f config.json
Modifying the file will automatically & atomically reload the configuration (to
disable, set the --watch
flag to false).
url
: The base URL of the AI provideronwards_key
: API key to include in requests to the target (optional)onwards_model
: Model name to use when forwarding requests (optional)keys
: Array of API keys required for authentication to this target (optional)--targets <file>
: Path to configuration file (required)--port <port>
: Port to listen on (default: 3000)--watch
: Enable configuration file watching for hot-reloading (default: true)--metrics
: Enable Prometheus metrics endpoint (default: true)--metrics-port <port>
: Port for Prometheus metrics (default: 9090)--metrics-prefix <prefix>
: Prefix for metrics (default: "onwards")Get a list of all configured targets, in the openAI models format:
curl http://localhost:3000/v1/models
Send requests to the gateway using the standard OpenAI API format:
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Override the target using the model-override
header:
curl -X POST http://localhost:3000/v1/chat/completions \
-H "model-override: claude-3" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
This is also used for routing requests without bodies - for example, to get the embeddings usage for your organization:
curl -X GET http://localhost:3000/v1/organization/usage/embeddings \
-H "model-override: claude-3"
To enable Prometheus metrics, start the gateway with the --metrics
flag, then access the metrics endpoint by:
curl http://localhost:9090/metrics
Onwards supports bearer token authentication to control access to your AI targets. You can configure authentication keys both globally and per-target.
Global keys apply to all targets that have authentication enabled:
{
"auth": {
"global_keys": ["global-api-key-1", "global-api-key-2"]
},
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"keys": ["target-specific-key"]
}
}
}
You can also specify authentication keys for individual targets:
{
"targets": {
"secure-gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"keys": ["secure-key-1", "secure-key-2"]
},
"open-local": {
"url": "http://localhost:8080"
}
}
}
In this example:
secure-gpt-4
requires a valid bearer token from the keys
arrayopen-local
has no authentication requirementsIf both global and local keys are supplied, either global or local keys will be valid for accessing models with local keys.
When a target has keys
configured, requests must include a valid Authorization: Bearer <token>
header where <token>
matches one of the configured keys. If global keys are configured, they are automatically added to each target's key set.
Successful authenticated request:
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer secure-key-1" \
-H "Content-Type: application/json" \
-d '{
"model": "secure-gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Failed authentication (invalid key):
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer wrong-key" \
-H "Content-Type: application/json" \
-d '{
"model": "secure-gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Returns: 401 Unauthorized
Failed authentication (missing header):
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "secure-gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Returns: 401 Unauthorized
No authentication required:
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "open-local",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Success - no authentication required for this target
Onwards supports per-target rate limiting using a token bucket algorithm. This allows you to control the request rate to each AI provider independently.
Add rate limiting to any target in your config.json
:
{
"targets": {
"rate-limited-model": {
"url": "https://api.provider.com",
"key": "your-api-key",
"rate_limit": {
"requests_per_second": 5.0,
"burst_size": 10
}
}
}
}
We use a "Token Bucket Algorithm": Each target gets its own token bucket.Tokens
are refilled at a rate determined by the "requests_per_second" parameter. The
maximum number of tokens in the bucket is determined by the "burst_size"
parameter. When the bucket is empty, requests to that target will be rejected
with a 429 Too Many Requests
response.
// Allow 1 request per second with burst of 5
"rate_limit": {
"requests_per_second": 1.0,
"burst_size": 5
}
// Allow 100 requests per second with burst of 200
"rate_limit": {
"requests_per_second": 100.0,
"burst_size": 200
}
Rate limiting is optional - targets without rate_limit
configuration have no
rate limiting applied.
In addition to per-target rate limiting, Onwards supports individual rate limits for different API keys. This allows you to provide different service tiers to your users - for example, basic users might have lower limits while premium users get higher limits.
Per-key rate limiting uses a key_definitions
section in the auth configuration:
{
"auth": {
"global_keys": ["fallback-key"],
"key_definitions": {
"basic_user": {
"key": "sk-user-12345",
"rate_limit": {
"requests_per_second": 10,
"burst_size": 20
}
},
"premium_user": {
"key": "sk-premium-67890",
"rate_limit": {
"requests_per_second": 100,
"burst_size": 200
}
},
"enterprise_user": {
"key": "sk-enterprise-abcdef",
"rate_limit": {
"requests_per_second": 500,
"burst_size": 1000
}
}
}
},
"targets": {
"gpt-4": {
"url": "https://api.openai.com",
"onwards_key": "sk-your-openai-key",
"keys": ["basic_user", "premium_user", "enterprise_user", "fallback-key"]
}
}
}
Rate limits are checked in this order:
If either limit is exceeded, the request returns 429 Too Many Requests
.
Basic user request (10/sec limit):
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer sk-user-12345" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
Premium user request (100/sec limit):
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer sk-premium-67890" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
Legacy key (no per-key limits):
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer fallback-key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
Run the test suite:
cargo test