litellm-rs

Crates.iolitellm-rs
lib.rslitellm-rs
version0.1.3
created_at2025-07-28 10:04:27.436908+00
updated_at2025-09-18 00:51:32.879315+00
descriptionA high-performance AI Gateway written in Rust, providing OpenAI-compatible APIs with intelligent routing, load balancing, and enterprise features
homepagehttps://github.com/majiayu000/litellm-rs
repositoryhttps://github.com/majiayu000/litellm-rs
max_upload_size
id1770953
size3,732,724
lif (majiayu000)

documentation

https://docs.rs/litellm-rs

README

litellm-rs

crates.io Documentation MIT licensed Build Status

A high-performance Rust library for unified LLM API access.

litellm-rs provides a simple, consistent interface to interact with multiple AI providers (OpenAI, Anthropic, Google, Azure, and more) through a single, unified API. Built with Rust's performance and safety guarantees, it simplifies multi-provider AI integration in production systems.

use litellm_rs::{completion, user_message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Works with any supported provider
    let response = completion(
        "gpt-4",  // or "claude-3", "gemini-pro", etc.
        vec![user_message("Hello!")],
        None,
    ).await?;

    println!("{}", response.choices[0].message.content.as_ref().unwrap());
    Ok(())
}

Key Features

  • Unified API - Single interface for OpenAI, Anthropic, Google, Azure, and 100+ other providers
  • High Performance - Built in Rust with async/await for maximum throughput
  • Production Ready - Automatic retries, comprehensive error handling, and provider failover
  • Flexible Deployment - Use as a Rust library or deploy as a standalone HTTP gateway
  • OpenAI Compatible - Works with existing OpenAI client libraries and tools

Installation

Add this to your Cargo.toml:

[dependencies]
litellm-rs = "0.1.0"
tokio = { version = "1.0", features = ["full"] }
serde_json = "1.0"

Or build from source:

git clone https://github.com/majiayu000/litellm-rs.git
cd litellm-rs
cargo build --release

Usage

As a Library

Basic Example

use litellm_rs::{completion, user_message, system_message, CompletionOptions};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Set your API key
    std::env::set_var("OPENAI_API_KEY", "your-openai-key");

    // Simple completion call
    let response = completion(
        "gpt-4",
        vec![user_message("Hello, how are you?")],
        None,
    ).await?;

    println!("Response: {}", response.choices[0].message.content.as_ref().unwrap());

    // With system message and options
    let response = completion(
        "gpt-4",
        vec![
            system_message("You are a helpful assistant."),
            user_message("Explain quantum computing"),
        ],
        Some(CompletionOptions {
            temperature: Some(0.7),
            max_tokens: Some(150),
            ..Default::default()
        }),
    ).await?;

    println!("AI: {}", response.choices[0].message.content.as_ref().unwrap());
    Ok(())
}

Using Multiple Providers

use litellm_rs::{completion, user_message, CompletionOptions};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Set API keys for different providers
    std::env::set_var("OPENAI_API_KEY", "your-openai-key");
    std::env::set_var("ANTHROPIC_API_KEY", "your-anthropic-key");
    std::env::set_var("GROQ_API_KEY", "your-groq-key");

    // Call OpenAI
    let openai_response = completion(
        "gpt-4",
        vec![user_message("Hello from OpenAI!")],
        None,
    ).await?;

    // Call Anthropic Claude
    let claude_response = completion(
        "anthropic/claude-3-sonnet-20240229",
        vec![user_message("Hello from Claude!")],
        None,
    ).await?;

    // Call Groq (with reasoning)
    let groq_response = completion(
        "groq/deepseek-r1-distill-llama-70b",
        vec![user_message("Solve this math problem: 2+2=?")],
        Some(CompletionOptions {
            extra_params: {
                let mut params = std::collections::HashMap::new();
                params.insert("reasoning_effort".to_string(), serde_json::json!("medium"));
                params
            },
            ..Default::default()
        }),
    ).await?;

    println!("OpenAI: {}", openai_response.choices[0].message.content.as_ref().unwrap());
    println!("Claude: {}", claude_response.choices[0].message.content.as_ref().unwrap());
    println!("Groq: {}", groq_response.choices[0].message.content.as_ref().unwrap());

    Ok(())
}

Custom Endpoints

use litellm_rs::{completion, user_message, CompletionOptions};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Call any OpenAI-compatible API with custom endpoint
    let response = completion(
        "llama-3.1-70b",  // Model name
        vec![user_message("Hello from custom endpoint!")],
        Some(CompletionOptions {
            api_key: Some("your-custom-api-key".to_string()),
            api_base: Some("https://your-custom-endpoint.com/v1".to_string()),
            ..Default::default()
        }),
    ).await?;

    println!("Custom API: {}", response.choices[0].message.content.as_ref().unwrap());
    Ok(())
}

As a Gateway Server

Start the server:

# Set your API keys
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"

# Start the proxy server
cargo run

# Server starts on http://localhost:8000

Make requests:

# OpenAI GPT-4
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello, how are you?"}]
  }'

# Anthropic Claude
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet",
    "messages": [{"role": "user", "content": "Hello, how are you?"}]
  }'

Response (OpenAI Format)

{
    "id": "chatcmpl-1214900a-6cdd-4148-b663-b5e2f642b4de",
    "created": 1751494488,
    "model": "claude-3-sonnet",
    "object": "chat.completion",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Hello! I'm doing well, thank you for asking. How are you doing today?",
                "role": "assistant"
            }
        }
    ],
    "usage": {
        "completion_tokens": 17,
        "prompt_tokens": 12,
        "total_tokens": 29
    }
}

Call any model supported by a provider, with model=<model_name>. See Supported Providers for complete list.

Streaming (Docs)

LiteLLM-RS supports streaming the model response back, pass stream=true to get a streaming response. Streaming is supported for all models (OpenAI, Anthropic, Google, Azure, Groq, etc.)

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Supported Providers

  • OpenAI - GPT-4, GPT-3.5, DALL-E
  • Anthropic - Claude 3 Opus, Sonnet, Haiku
  • Google - Gemini Pro, Gemini Flash
  • Azure OpenAI - Managed OpenAI deployments
  • Groq - High-speed Llama inference
  • AWS Bedrock - Claude, Llama, and more
  • And 95+ more providers...

View all providers →

Features

  • Unified Interface - Single API for 100+ providers
  • OpenAI Compatible - Drop-in replacement for OpenAI client
  • Streaming Support - Real-time response streaming
  • Automatic Retries - Built-in exponential backoff
  • Load Balancing - Distribute requests across providers
  • Cost Tracking - Monitor spending per request/user
  • Function Calling - Tool use across all capable models
  • Vision Support - Multimodal inputs for capable models
  • Custom Endpoints - Connect to self-hosted models
  • Request Caching - Reduce costs with intelligent caching
  • Rate Limiting - Protect against quota exhaustion
  • Observability - OpenTelemetry tracing and metrics

Configuration

Create a config/gateway.yaml file:

server:
  host: "0.0.0.0"
  port: 8000

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
  google:
    api_key: "${GOOGLE_API_KEY}"

router:
  strategy: "round_robin"
  max_retries: 3
  timeout: 60

See config/gateway.yaml.example for a complete example.

Documentation

Performance

Metric Value Notes
Throughput 10,000+ req/s On 8-core CPU
Latency <10ms Routing overhead
Memory ~50MB Base footprint
Startup <100ms Cold start time

Deployment

Docker

docker run -p 8000:8000 \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
  litellm-rs:latest

Kubernetes

kubectl apply -f deployment/kubernetes/

Binary

# Download the latest release
curl -L https://github.com/majiayu000/litellm-rs/releases/latest/download/litellm-rs-linux-amd64 -o litellm-rs
chmod +x litellm-rs
./litellm-rs

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

# Setup
git clone https://github.com/majiayu000/litellm-rs.git
cd litellm-rs

# Test
cargo test

# Format
cargo fmt

# Lint
cargo clippy

Roadmap

  • Core OpenAI-compatible API
  • 15+ provider integrations
  • Streaming support
  • Automatic retries and failover
  • Response caching
  • WebSocket support
  • Plugin system
  • Web dashboard

See GitHub Issues for detailed roadmap.

License

Licensed under the MIT License. See LICENSE for details.

Acknowledgments

Special thanks to the Rust community and all contributors to this project.


Built with Rust for performance and reliability.

Commit count: 31

cargo fmt