llm-edge-proxy

Crates.iollm-edge-proxy
lib.rsllm-edge-proxy
version0.1.0
created_at2025-11-09 02:01:39.359469+00
updated_at2025-11-09 02:01:39.359469+00
descriptionCore HTTP proxy functionality for LLM Edge Agent
homepage
repositoryhttps://github.com/globalbusinessadvisors/llm-edge-agent
max_upload_size
id1923509
size99,141
GBA (globalbusinessadvisors)

documentation

README

llm-edge-proxy

Crates.io Documentation License

Core HTTP proxy functionality for LLM Edge Agent - A high-performance, production-ready HTTP/HTTPS proxy server for LLM requests.

Overview

This crate provides the foundational HTTP/HTTPS server layer with:

  • High-Performance Server: Axum 0.8 + Hyper 1.0 with HTTP/2 support
  • TLS Termination: Memory-safe TLS with Rustls 0.23
  • Authentication: API key validation (x-api-key, Bearer token)
  • Rate Limiting: tower-governor with configurable limits
  • Request Handling: Timeouts, size limits, validation
  • Observability: Structured JSON logging with OpenTelemetry integration
  • Health Checks: Kubernetes-compatible endpoints
  • Metrics: Prometheus-compatible metrics endpoint

Features

Security & Authentication

  • API key authentication via headers
  • SHA-256 hashed key support
  • Configurable public endpoints
  • Request size limits (10MB default)
  • Request timeout (30s default)

Performance

  • Target: <5ms latency overhead
  • Expected: >20,000 req/s throughput
  • Memory: ~50MB base + ~100KB per connection
  • Zero-copy where possible
  • Async/await throughout

Observability

  • Structured JSON logging
  • Request tracing with correlation IDs
  • Prometheus metrics
  • OpenTelemetry integration (prepared)

OpenAI-Compatible API

  • POST /v1/chat/completions - Chat completion endpoint
  • POST /v1/completions - Legacy completion endpoint
  • Note: Returns mock responses in Layer 1

Installation

Add this to your Cargo.toml:

[dependencies]
llm-edge-proxy = "0.1.0"
tokio = { version = "1.40", features = ["full"] }
anyhow = "1.0"

Usage

As a Library

use llm_edge_proxy::{Config, build_app, serve};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Load configuration
    let config = Config::from_env()?;

    // Build application with middleware
    let app = build_app(config.clone()).await?;

    // Start server
    let addr = config.server.address.parse()?;
    serve(addr, app).await?;

    Ok(())
}

Standalone Usage (Outside Workspace)

When using this crate independently:

use llm_edge_proxy::{Config, ProxyConfig, ServerConfig, build_app, serve};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Manual configuration
    let config = Config {
        server: ServerConfig {
            address: "0.0.0.0:8080".to_string(),
            timeout_seconds: 30,
            max_request_size: 10_485_760,
        },
        proxy: ProxyConfig {
            auth_enabled: true,
            api_keys: vec!["your-api-key".to_string()],
            rate_limit_enabled: true,
            rate_limit_rpm: 1000,
            rate_limit_burst: 100,
        },
    };

    let app = build_app(config.clone()).await?;
    let addr = config.server.address.parse()?;
    serve(addr, app).await?;

    Ok(())
}

Configuration

All configuration via environment variables:

# Server
SERVER_ADDRESS=0.0.0.0:8080
SERVER_TIMEOUT_SECONDS=30
MAX_REQUEST_SIZE=10485760

# Authentication
AUTH_ENABLED=true
API_KEYS=key1,key2
AUTH_HEALTH_CHECK=false

# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_RPM=1000
RATE_LIMIT_BURST=100

# Observability
LOG_LEVEL=info
ENABLE_TRACING=true
ENABLE_METRICS=true

API Endpoints

Health Checks

# General health
GET /health

# Kubernetes readiness
GET /health/ready

# Kubernetes liveness
GET /health/live

Metrics

# Prometheus metrics
GET /metrics

LLM Proxy

# Chat completions (OpenAI-compatible)
POST /v1/chat/completions
Content-Type: application/json
x-api-key: your-api-key

{
  "model": "gpt-4",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}

Architecture

┌──────────────────────────────────────┐
│      TLS Termination (Rustls)        │
└────────────────┬─────────────────────┘
                 ↓
┌──────────────────────────────────────┐
│   Authentication Middleware          │
│   (API key validation)               │
└────────────────┬─────────────────────┘
                 ↓
┌──────────────────────────────────────┐
│   Rate Limiting Middleware           │
│   (tower-governor)                   │
└────────────────┬─────────────────────┘
                 ↓
┌──────────────────────────────────────┐
│   Request Timeout Middleware         │
└────────────────┬─────────────────────┘
                 ↓
┌──────────────────────────────────────┐
│   Route Handlers                     │
│   (health, metrics, proxy endpoints) │
└──────────────────────────────────────┘

Module Structure

  • config/ - Configuration management
  • error.rs - Error types and response handling
  • middleware/ - Authentication, rate limiting, timeout
  • server/ - HTTP server, routes, TLS, tracing
  • lib.rs - Public API

Development

Run Tests

cargo test --package llm-edge-proxy

Build

# Debug
cargo build --package llm-edge-proxy

# Release (optimized)
cargo build --package llm-edge-proxy --release

Run Locally

# Set environment
export AUTH_ENABLED=false
export RATE_LIMIT_ENABLED=false

# Run (requires main binary in llm-edge-agent crate)
cargo run --package llm-edge-agent

Performance Characteristics

Metric Target Status
Latency overhead <5ms P95
Throughput >5,000 req/s
Memory usage <512MB
CPU usage (idle) <2%

Dependencies

Key dependencies:

  • axum 0.8 - Web framework
  • hyper 1.0 - HTTP implementation
  • tower 0.5 - Middleware
  • tower-http 0.6 - HTTP middleware
  • tower_governor 0.4 - Rate limiting
  • rustls 0.23 - TLS
  • tokio 1.40 - Async runtime
  • tracing 0.1 - Logging/tracing

Integration with Layer 2

Layer 1 provides integration points for Layer 2:

  1. State Management: Easy to extend with cache, providers, etc.
  2. Middleware Chain: Extensible middleware stack
  3. Error Handling: Structured errors ready for orchestration
  4. Tracing: Request correlation for distributed tracing

Contributing

Contributions are welcome! Please see the main repository for guidelines.

Related Crates

This crate is part of the LLM Edge Agent ecosystem:

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Status

Layer 1 Complete (1,027 LOC)

  • Core HTTP server: ✅
  • TLS support: ✅
  • Authentication: ✅
  • Rate limiting: ✅
  • Health checks: ✅
  • Metrics: ✅
  • Tests: ✅

Next: Layer 2 (Caching, Providers, Routing)

Commit count: 0

cargo fmt