| Crates.io | silt |
| lib.rs | silt |
| version | 0.1.4 |
| created_at | 2025-12-12 08:24:54.333337+00 |
| updated_at | 2025-12-12 10:21:14.866273+00 |
| description | A transparent batching proxy for the OpenAI API that accumulates real-time requests and dispatches at intervals using the OpenAI Batch API to achieve ~50% cost savings |
| homepage | https://github.com/doublewordai/silt |
| repository | https://github.com/doublewordai/silt |
| max_upload_size | |
| id | 1981195 |
| size | 109,318 |
A transparent batching proxy for the OpenAI API that accumulates real-time requests and dispatches at intervals using the OpenAI Batch API to achieve ~50% cost savings.
Includes functionality to make it easy to handle long lived 'real-time' requests - including request resumption via idempotency keys and TCP keepalives to avoid connection drops.
Client → Batch Proxy → OpenAI Batch API
↓ ↓
Idempotency-Key Batch File Upload
↓ ↓
Redis State ← ← ← ← ← Batch Polling
Idempotency-Key headercd silt
cargo build --release
cp .env.example .env
# Edit .env with your settings
Required configuration:
REDIS_URL: Redis connection URL (default: redis://127.0.0.1:6379)Optional configuration:
BATCH_WINDOW_SECS: How long to accumulate requests (default: 60)BATCH_POLL_INTERVAL_SECS: Batch status polling interval (default: 60)SERVER_HOST: Server bind address (default: 0.0.0.0)SERVER_PORT: Server port (default: 8080)TCP_KEEPALIVE_SECS: TCP keepalive interval (default: 60)redis-server
cargo run --release
The proxy is designed to work with the standard OpenAI Python client with minimal modifications:
import uuid
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="dummy", # Not validated by proxy
timeout=3600, # 1 hour timeout per attempt
)
# Generate unique ID for this request
request_id = "this-is-my-unique-id"
# Make request with idempotency key
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={
"Idempotency-Key": request_id
}
)
print(response.choices[0].message.content)
If the long running connection drops, simply repeat the same request with the same request id to resume.
For production use, implement retry logic to handle connection drops. The idempotency key means we'll resume the same request safely, without being double charged, or waiting twice as long.
import time
from openai import APIError, APITimeoutError
def batched_completion(messages, request_id, max_wait_hours=24):
retry_delay = 30
start_time = time.time()
while time.time() - start_time < max_wait_hours * 3600:
try:
return client.chat.completions.create(
model="gpt-4",
messages=messages,
extra_headers={"Idempotency-Key": request_id}
)
except (APITimeoutError, APIError) as e:
print(f"Retrying in {retry_delay}s...")
time.sleep(retry_delay)
retry_delay = min(retry_delay * 1.5, 300)
raise TimeoutError("Batch did not complete in time")
See example_client.py for a complete working example.
The proxy exposes a standard OpenAI-compatible endpoint:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Idempotency-Key: $(uuidgen)" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Note: The Idempotency-Key header is optional. If not provided, the server will automatically generate a unique UUID for the request. However, you must provide your own key if you want to support connection resumption and retries - server-generated keys cannot be used for reconnection since the client doesn't know what was generated.
Idempotency-KeyqueuedBATCH_WINDOW_SECS, dispatcher collects all queued requestsprocessing, worker polls every BATCH_POLL_INTERVAL_SECSIdempotency-Key always returns same resultPerfect for:
Not suitable for:
Run tests (requires Redis):
cargo test
Run with debug logging:
RUST_LOG=debug cargo run
MIT