llmkit

Crates.iollmkit
lib.rsllmkit
version0.1.3
created_at2026-01-12 02:51:42.226334+00
updated_at2026-01-16 03:09:17.947345+00
descriptionProduction-grade LLM client - 100+ providers, 11,000+ models. Pure Rust.
homepage
repositoryhttps://github.com/yfedoseev/llmkit
max_upload_size
id2036913
size4,858,578
Yury Fedoseev (yfedoseev)

documentation

README

LLMKit

The production-grade LLM client. One API for 100+ providers. Pure Rust core with native bindings.

11,000+ models · 100+ providers · Rust | Python | Node.js

                         ┌──────────────┐
                         │  Rust Core   │
                         └──────┬───────┘
          ┌──────────┬─────────┼─────────┬──────────┐
          ▼          ▼         ▼         ▼          ▼
      ┌───────┐  ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
      │Python │  │ Node  │ │ WASM  │ │  Go   │ │  ...  │
      │  ✅   │  │  ✅   │ │ Soon  │ │ Soon  │ │       │
      └───────┘  └───────┘ └───────┘ └───────┘ └───────┘

CI Crates.io Docs PyPI npm License

Documentation · Changelog · Contributing

Why LLMKit?

Built for Production

LLMKit is written in pure Rust — no Python runtime, no garbage collector, no memory leaks. Deploy with confidence knowing your LLM infrastructure won't degrade over time or crash under load.

  • Memory Safety — Rust's ownership model eliminates memory leaks by design
  • True Concurrency — No GIL. Handle thousands of concurrent streams efficiently
  • Minimal Footprint — Native binary, not a 150MB Python package
  • Run Forever — No worker restarts, no memory bloat, no surprises

Features That Actually Work

  • Prompt Caching — Native support for Anthropic, OpenAI, Google, DeepSeek. Save up to 90% on API costs
  • Extended Thinking — Unified API for reasoning across 5 providers (Anthropic, OpenAI, Google, DeepSeek, OpenRouter)
  • Streaming — Zero-copy streaming with automatic request deduplication
  • 11,000+ Model Registry — Pricing, context limits, and capabilities baked in. No external API calls

Production Features

Feature Description
Smart Router ML-based provider selection optimizing for latency, cost, or reliability
Circuit Breaker Automatic failure detection and recovery with anomaly detection
Rate Limiting Lock-free, hierarchical rate limiting at scale
Cost Tracking Multi-tenant metering with cache-aware pricing
Guardrails PII detection, secret scanning, prompt injection prevention
Observability OpenTelemetry integration for tracing and metrics

Quick Start

Rust

use llmkit::{LLMKitClient, Message, CompletionRequest};

let client = LLMKitClient::from_env()?;
let response = client.complete(
    CompletionRequest::new("anthropic/claude-sonnet-4-20250514", vec![Message::user("Hello!")])
).await?;
println!("{}", response.text_content());

Python

from llmkit import LLMKitClient, Message, CompletionRequest

client = LLMKitClient.from_env()
response = client.complete(CompletionRequest(
    model="openai/gpt-4o",
    messages=[Message.user("Hello!")]
))
print(response.text_content())

Node.js

import { LLMKitClient, Message, CompletionRequest } from 'llmkit-node'

const client = LLMKitClient.fromEnv()
const response = await client.complete(
  new CompletionRequest('anthropic/claude-sonnet-4-20250514', [Message.user('Hello!')])
)
console.log(response.textContent())

Installation

Rust

[dependencies]
llmkit = { version = "0.1", features = ["anthropic", "openai"] }

Python

pip install llmkit-python

Node.js

npm install llmkit-node

Features

Chat Media Specialized
Streaming Image Generation Embeddings
Tool Calling Vision/Images Token Counting
Structured Output Audio STT/TTS Batch Processing
Extended Thinking Video Generation Model Registry
Prompt Caching 11,000+ Models

Providers

Category Providers
Core Anthropic, OpenAI, Azure OpenAI
Cloud AWS Bedrock, Google Vertex AI, Google AI
Fast Inference Groq, Mistral, Cerebras, SambaNova, Fireworks, DeepSeek
Enterprise Cohere, AI21
Hosted Together, Perplexity, DeepInfra, OpenRouter
Local Ollama, LM Studio, vLLM
Audio Deepgram, ElevenLabs
Video Runware

See PROVIDERS.md for the full list with environment variables.

Examples

Streaming

let mut stream = client.complete_stream(request).await?;
while let Some(chunk) = stream.next().await {
    if let Some(text) = chunk?.text() { print!("{}", text); }
}

Tool Calling

from llmkit import ToolBuilder

tool = ToolBuilder("get_weather") \
    .description("Get current weather") \
    .string_param("city", "City name", required=True) \
    .build()

request = CompletionRequest(model, messages).with_tools([tool])

Prompt Caching

# Cache large system prompts - save up to 90% on repeated calls
request = CompletionRequest(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[Message.system(large_prompt), Message.user("Question")]
).with_cache()

Extended Thinking

// Unified reasoning API across providers
const request = new CompletionRequest('anthropic/claude-sonnet-4-20250514', messages)
  .withThinking({ budgetTokens: 10000 })

const response = await client.complete(request)
console.log(response.thinkingContent()) // See the reasoning process
console.log(response.textContent())     // Final answer

Model Registry

from llmkit import get_model_info, get_models_by_provider

# Get model details - no API calls, instant lookup
info = get_model_info("anthropic/claude-sonnet-4-20250514")
print(f"Context: {info.context_window}, Price: ${info.input_price}/1M tokens")

# Find models by provider
anthropic_models = get_models_by_provider("anthropic")

For more examples, see examples/.

Documentation

Building from Source

git clone https://github.com/yfedoseev/llmkit
cd llmkit
cargo build --release
cargo test

# Python bindings
cd llmkit-python && maturin develop

# Node.js bindings
cd llmkit-node && pnpm install && pnpm build

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

Dual-licensed under MIT or Apache-2.0.


Built with Rust · Production Ready · GitHub

Commit count: 9

cargo fmt