| Crates.io | llm-edge-agent |
| lib.rs | llm-edge-agent |
| version | 0.1.0 |
| created_at | 2025-11-09 02:19:43.855723+00 |
| updated_at | 2025-11-09 02:19:43.855723+00 |
| description | Main LLM Edge Agent binary - High-performance LLM intercepting proxy |
| homepage | |
| repository | https://github.com/globalbusinessadvisors/llm-edge-agent |
| max_upload_size | |
| id | 1923517 |
| size | 151,207 |
High-performance LLM intercepting proxy with intelligent caching, routing, and observability.
The llm-edge-agent binary is the main executable for the LLM Edge Agent system - an enterprise-grade intercepting proxy for Large Language Model APIs. It sits between your applications and LLM providers (OpenAI, Anthropic, etc.), providing intelligent request routing, multi-tier caching, cost optimization, and comprehensive observability.
Key Features:
cargo install llm-edge-agent
# Clone the repository
git clone https://github.com/globalbusinessadvisors/llm-edge-agent.git
cd llm-edge-agent
# Build the binary
cargo build --release --package llm-edge-agent
# The binary will be at: target/release/llm-edge-agent
# Pull the image (when published)
docker pull llm-edge-agent:latest
# Or build locally
docker build -t llm-edge-agent .
At least one LLM provider API key is required:
Optional infrastructure:
Create a .env file or set environment variables:
# Required: At least one provider API key
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
# Optional: Server configuration
HOST=0.0.0.0
PORT=8080
METRICS_PORT=9090
# Optional: L2 Cache (Redis)
ENABLE_L2_CACHE=true
REDIS_URL=redis://localhost:6379
# Optional: Observability
ENABLE_TRACING=true
ENABLE_METRICS=true
RUST_LOG=info,llm_edge_agent=debug
Standalone (L1 cache only):
# Set API key
export OPENAI_API_KEY=sk-your-key
# Run the binary
llm-edge-agent
With full infrastructure (recommended):
# Start complete stack with Docker Compose
docker-compose -f docker-compose.production.yml up -d
# The agent will automatically connect to Redis, Prometheus, etc.
Check health:
curl http://localhost:8080/health
The proxy exposes an OpenAI-compatible API:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hello, world!"
}
]
}'
Response includes metadata:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "gpt-3.5-turbo",
"choices": [...],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
},
"metadata": {
"provider": "openai",
"cached": false,
"cache_tier": null,
"latency_ms": 523,
"cost_usd": 0.000125
}
}
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Server bind address |
PORT |
8080 |
HTTP server port |
METRICS_PORT |
9090 |
Prometheus metrics port |
OPENAI_API_KEY |
- | OpenAI API key (required if using OpenAI) |
ANTHROPIC_API_KEY |
- | Anthropic API key (required if using Anthropic) |
ENABLE_L2_CACHE |
false |
Enable Redis L2 cache |
REDIS_URL |
- | Redis connection URL |
ENABLE_TRACING |
true |
Enable distributed tracing |
ENABLE_METRICS |
true |
Enable Prometheus metrics |
RUST_LOG |
info |
Logging configuration |
Main Proxy Endpoint:
POST /v1/chat/completions - OpenAI-compatible chat completionsHealth & Monitoring:
GET /health - Detailed system health statusGET /health/ready - Kubernetes readiness probeGET /health/live - Kubernetes liveness probeGET /metrics - Prometheus metricsOpenAI:
Anthropic:
The proxy automatically routes requests to the appropriate provider based on the model name.
The binary integrates all LLM Edge Agent components:
llm-edge-agent (binary)
├── HTTP Server (Axum)
│ ├── Request validation
│ ├── Health check endpoints
│ └── Metrics endpoint
│
├── Cache Layer (llm-edge-cache)
│ ├── L1: In-memory (Moka)
│ └── L2: Distributed (Redis)
│
├── Routing Layer (llm-edge-routing)
│ ├── Model-based routing
│ ├── Cost optimization
│ ├── Latency optimization
│ └── Failover support
│
├── Provider Layer (llm-edge-providers)
│ ├── OpenAI adapter
│ └── Anthropic adapter
│
└── Observability (llm-edge-monitoring)
├── Prometheus metrics
├── Distributed tracing
└── Structured logging
# Build image
docker build -t llm-edge-agent .
# Run container
docker run -d \
-p 8080:8080 \
-p 9090:9090 \
-e OPENAI_API_KEY=sk-your-key \
-e ENABLE_L2_CACHE=false \
--name llm-edge-agent \
llm-edge-agent:latest
See docker-compose.production.yml in the repository root for a complete production-ready stack including:
# Create namespace
kubectl create namespace llm-edge-production
# Create secrets
kubectl create secret generic llm-edge-secrets \
--from-literal=openai-api-key="sk-..." \
--from-literal=anthropic-api-key="sk-ant-..." \
-n llm-edge-production
# Deploy
kubectl apply -f deployments/kubernetes/llm-edge-agent.yaml
Features:
The binary exposes Prometheus metrics on the configured METRICS_PORT:
Request Metrics:
llm_edge_requests_total - Total request countllm_edge_request_duration_seconds - Request latency histogramllm_edge_request_errors_total - Error count by typeCache Metrics:
llm_edge_cache_hits_total{tier="l1|l2"} - Cache hitsllm_edge_cache_misses_total - Cache missesllm_edge_cache_latency_seconds - Cache operation latencyProvider Metrics:
llm_edge_provider_latency_seconds - Provider response timellm_edge_provider_errors_total - Provider errorsllm_edge_cost_usd_total - Cumulative cost trackingToken Metrics:
llm_edge_tokens_used_total - Token usage by provider/modelllm_edge_tokens_prompt_total - Prompt tokensllm_edge_tokens_completion_total - Completion tokensHealth endpoint response:
{
"status": "healthy",
"timestamp": "2025-01-08T12:00:00Z",
"version": "1.0.0",
"cache": {
"l1_healthy": true,
"l2_healthy": true,
"l2_configured": true
},
"providers": {
"openai": {
"configured": true,
"healthy": true
},
"anthropic": {
"configured": true,
"healthy": true
}
}
}
Benchmarks:
Cache Performance:
For comprehensive documentation, see the root README:
This crate can be used both as a binary and as a library:
As a Binary:
cargo run --package llm-edge-agent
As a Library:
[dependencies]
llm-edge-agent = "1.0.0"
use llm_edge_agent::{AppConfig, initialize_app_state, handle_chat_completions};
#[tokio::main]
async fn main() {
let config = AppConfig::from_env();
let state = initialize_app_state(config).await.unwrap();
// Use in your own Axum router
// let app = Router::new()
// .route("/v1/chat/completions", post(handle_chat_completions))
// .with_state(Arc::new(state));
}
Provider not available:
Error: No providers configured
Solution: Set at least one API key (OPENAI_API_KEY or ANTHROPIC_API_KEY)
Redis connection failed:
Warning: L2 cache enabled but connection failed
Solution: Verify Redis is running and REDIS_URL is correct. Agent will fall back to L1-only mode.
High latency:
curl http://localhost:8080/healthcurl http://localhost:9090/metricsRUST_LOG=debugLicensed under the Apache License, Version 2.0. See LICENSE for details.
See the Contributing Guide in the root repository.