cllient

Crates.io	cllient
lib.rs	cllient
version	0.1.2
created_at	2025-10-28 06:01:16.045973+00
updated_at	2025-12-26 09:49:27.747531+00
description	A comprehensive Rust client for LLM APIs with unified interface and model management
homepage	https://github.com/JuggernautLabs/cllient
repository	https://github.com/JuggernautLabs/cllient
max_upload_size
id	1904244
size	591,836

Ben Haware (retry-failed)

documentation

https://docs.rs/cllient

README

cllient

A config-driven LLM client in Rust. Define providers and models in YAML instead of code.

┌──────────────────────────────────────────────────────────────────┐
│  YAML Configs                                                    │
│  ┌─────────────────┐    ┌─────────────────┐                      │
│  │ service/        │    │ family/         │                      │
│  │  openai.yaml    │◄───│  gpt/4o.yaml    │  model references    │
│  │  anthropic.yaml │◄───│  claude/*.yaml  │  service by name     │
│  │  deepseek.yaml  │◄───│  deepseek/*.yaml│                      │
│  └─────────────────┘    └─────────────────┘                      │
│          │                      │                                │
│          ▼                      ▼                                │
│  ┌─────────────────────────────────────────┐                     │
│  │           ModelRegistry                 │                     │
│  │   - Loads configs (embedded or files)   │                     │
│  │   - Renders Handlebars templates        │                     │
│  │   - Substitutes env vars (API keys)     │                     │
│  └─────────────────────────────────────────┘                     │
│                        │                                         │
│                        ▼                                         │
│  ┌─────────────────────────────────────────┐                     │
│  │           HTTP + SSE Streaming          │                     │
│  │   Provider-specific parsers (OpenAI,    │                     │
│  │   Anthropic, Google) extract content    │                     │
│  └─────────────────────────────────────────┘                     │
│                        │                                         │
│                        ▼                                         │
│  ┌─────────────────────────────────────────┐                     │
│  │        Streaming JSON Output            │                     │
│  │   Valid JSON emitted incrementally -    │                     │
│  │   watch the response build in real-time │                     │
│  └─────────────────────────────────────────┘                     │
└──────────────────────────────────────────────────────────────────┘

Experimental: Proof-of-concept with 3 tested providers (OpenAI, Anthropic, DeepSeek). 242 additional models available via OpenRouter but not validated. Not production-ready.

Features

Config-driven - Add providers via YAML, no code changes
SSE streaming - Real-time token streaming with provider-specific parsers
Streaming JSON output - Emit valid JSON incrementally as tokens arrive (pipe-friendly)
339 model configs - 97 direct integrations + 242 via OpenRouter
Model verification tracking - Know which models are tested vs auto-generated
Registry export - Single-call access to full registry for RPC/integrations
CLI + library - Use from command line or as a Rust crate
Embedded configs - Ship as a single binary with all configs baked in

Limitations

Most configs are untested (especially OpenRouter models)
Minimal error handling
YAML templates can be fragile
Research project, not production software

Quick Start

Installation

# Install from source
cargo install --path .

# Set up API keys
cp .env.example .env
# Edit .env with your API keys

Basic Usage

# List available models (alias: ls)
cllient list

# Simple completion (JSON output by default)
cllient ask gpt-4o-mini "What is Rust programming?"

# Streaming - outputs valid JSON incrementally as tokens arrive
cllient stream deepseek-chat "Tell me a story"

# Human-readable output with decorations
cllient --pretty stream deepseek-chat "Tell me a story"

# Clean output for piping (just the response, no JSON)
cllient --clean ask gpt-4o-mini "Hello" | wc -w

# Interactive chat
cllient chat claude-3-haiku-20240307

# Compare models
cllient compare gpt-4o-mini,claude-3-haiku "Explain quantum computing"

# Get help on any command
cllient --help
cllient ask --help

CLI Reference

cllient [OPTIONS] <COMMAND>

Commands:
  list (ls)       List available models
  list-services   List available services/providers
  ask             Single completion request
  stream          Streaming completion request
  chat            Interactive chat session
  compare         Compare multiple models on same prompt
  debug-response  Debug API response issues

Options:
  -v, --verbose   Enable debug logging
      --pretty    Human-readable output with decorations
      --clean     Raw response only (for piping)
  -h, --help      Print help (works on all subcommands)
  -V, --version   Print version

Programmatic Usage

use cllient::{ModelRegistry, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let registry = ModelRegistry::new()?;

    // Simple single message
    let response = registry
        .from_id("gpt-4o-mini")?
        .prompt("Hello, world!")
        .send()
        .await?;

    // Multi-turn conversation
    let response = registry
        .from_id("claude-3-haiku-20240307")?
        .messages(vec![
            Message::system("You are a helpful assistant"),
            Message::user("What is 2+2?"),
            Message::assistant("4"),
            Message::user("And 4+4?"),
        ])
        .send()
        .await?;

    // Use cheapest model matching pattern
    let response = registry
        .use_cheapest("claude-*")?
        .prompt("Explain AI")
        .send()
        .await?;

    println!("Response: {}", response.content);
    Ok(())
}

Streaming

The library provides real-time streaming via StreamEvent. Each event represents a piece of the response as it arrives:

use cllient::{ClientFactory, EmbeddedConfigLoader, EmbeddedClientFactory, CompletionRequest};
use cllient::streaming::StreamEvent;
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let loader = EmbeddedConfigLoader::new()?;
    let factory = EmbeddedClientFactory::new(loader);
    let client = factory.create_client("gpt-4o-mini")?;

    let request = CompletionRequest::text("user", "Tell me a story")
        .with_streaming(true);

    let mut stream = client.complete_stream(&request).await?;

    while let Some(event) = stream.next().await {
        match event? {
            StreamEvent::Content(text) => print!("{}", text),  // Token chunk
            StreamEvent::Start => println!("--- stream started ---"),
            StreamEvent::Finish(reason) => println!("\n--- done: {:?} ---", reason),
            StreamEvent::Usage { input_tokens, output_tokens, .. } => {
                println!("Tokens: in={:?}, out={:?}", input_tokens, output_tokens);
            }
            StreamEvent::Error(e) => eprintln!("Error: {}", e),
            _ => {}
        }
    }
    Ok(())
}

StreamEvent variants:

Variant	Description
`Content(String)`	A chunk of text content (the main output)
`Start`	Stream has started
`Finish(Option<String>)`	Stream complete, with optional finish reason
`Usage { input_tokens, output_tokens, total_tokens }`	Token usage stats
`Role(String)`	Role information (usually at start)
`Error(String)`	An error occurred
`Raw(String)`	Raw event data (for debugging)

Typed JSON Extraction

Use StreamEventExt::parse_json<T>() to extract typed data from streaming responses. This is useful when the LLM returns structured JSON (tool calls, structured output, etc.):

use cllient::streaming::{StreamEventExt, StreamItem};
use serde::Deserialize;
use schemars::JsonSchema;
use futures::StreamExt;

#[derive(Deserialize, JsonSchema)]
struct ToolCall {
    name: String,
    args: serde_json::Value,
}

let stream = client.complete_stream(&request).await?;
let mut typed_stream = stream.parse_json::<ToolCall>();

while let Some(item) = typed_stream.next().await {
    match item {
        StreamItem::Data(tool) => println!("Tool call: {}", tool.name),
        StreamItem::Text(t) => print!("{}", t.text),  // Non-JSON text
        StreamItem::Token(tok) => print!("{}", tok),  // Raw tokens
    }
}

StreamItem<T> variants:

Variant	Description
`Data(T)`	Successfully parsed JSON matching your type
`Text(TextContent)`	Plain text or JSON that doesn't match type T
`Token(String)`	Individual token for real-time display

Documentation

Getting Started

1. Installation & Setup - Prerequisites, installation, environment setup
2. CLI Usage Guide - Complete command-line reference
Examples: Basic Usage - Essential usage patterns

Core Guides

3. API Reference - Runtime API, CLI API, Low-level Client
4. Configuration - Services, models, environment variables
5. Architecture - "Unassociated truth" design principles

Advanced Usage

Examples - Real-world usage patterns and scripts
6. Development - Contributing, testing, workflow automation
Registry Export API - RPC-friendly registry access and model verification
Complete Documentation Hub - All guides and references

Architecture

The core idea is "unassociated truth" - services and models are separate entities that can be mixed:

Traditional:  gpt-4 → hardcoded to OpenAI API
cllient:      gpt-4 → { OpenAI, Azure, OpenRouter }
              Model configs reference service configs
              Service configs are HTTP templates

How it works:

Service configs (config/service/*.yaml) define HTTP request templates using Handlebars:

http:
  request: |
    POST /v1/chat/completions HTTP/1.1
    Authorization: Bearer ${OPENAI_API_KEY}
    {"model": "{{model_id}}", "messages": {{json messages}}}

Model configs (config/family/*/*.yaml) reference a service and add metadata:

model:
  id: gpt-4o-mini
  service: openai  # Points to config/service/openai.yaml
capabilities:
  context_window: 128000

Runtime loads configs, renders templates, makes HTTP requests

Why this is neat:

Add new providers by writing YAML, not Rust
Same model can work through multiple services
HTTP templates make debugging transparent (see exactly what's sent)
Configs can be embedded in binary or loaded at runtime

Core Components:

ModelRegistry - High-level API for model selection
Configuration System - Loads and validates YAML
HTTP Client - Renders templates, handles streaming
Template Engine - Handlebars + environment variables

Configuration

What's Actually Available

Tested & Working (97 models across 5 direct integrations):

OpenAI (35 models): gpt-4o, gpt-4o-mini, o1, o1-mini, etc.
Anthropic (12 models): claude-3-opus, claude-3-5-sonnet, claude-3-haiku
DeepSeek (22 models): deepseek-chat, deepseek-coder, deepseek-v3
Google (25 models): gemini-2.0-flash, gemini-pro, gemma variants
Legacy OpenAI (3 models): Old completion endpoints

Via OpenRouter (242 models, untested):

Meta, Microsoft, Cohere, X.AI, Perplexity, and 50+ others
These are auto-generated configs that should work but haven't been validated
Requires OPEN_ROUTER_API_KEY environment variable

Environment Variables

# Tested providers (add to .env)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
DEEPSEEK_API_KEY=your_deepseek_key

# OpenRouter (gives access to 242 models)
OPEN_ROUTER_API_KEY=your_openrouter_key
HTTP_REFERER=http://localhost:3000  # Required by OpenRouter
X_TITLE=cllient                      # Optional app name

# Google (untested but configured)
GOOGLE_API_KEY=your_google_key

# Optional: Custom config directory
CLLIENT_CONFIG_DIR=/path/to/custom/configs

Adding Providers

# config/service/newprovider.yaml
service:
  name: NewProvider
  base_url: https://api.newprovider.com

http:
  request: |
    POST /v1/chat/completions HTTP/1.1
    Authorization: Bearer ${NEWPROVIDER_API_KEY}
    
    {
      "model": "{{model_id}}",
      "messages": {{json messages}},
      "stream": {{stream}}
    }

Complete Configuration Guide: 4. Configuration Documentation

Development

Build & Test

# Build project
make build

# Run tests  
make test

# See all commands
make help

Model Management

# Fetch latest models from providers
make models-openai
make models-anthropic  
make models-all

# Pricing analysis
make pricing-cheapest
make cost-analysis

Complete Development Guide: 6. Development Documentation

Status

Completed:

Core Rust implementation with SSE streaming
Streaming JSON output (valid JSON emitted incrementally)
Config-driven architecture (YAML templates)
Basic CLI tool (ask, stream, chat, compare)
3 tested providers (OpenAI, Anthropic, DeepSeek)
OpenRouter integration (242 auto-generated configs)
Pricing data scraping from OpenRouter

Aspirational (Not Started):

Test the 242 OpenRouter models individually
Validate Google/Azure/other direct integrations
Python bindings (PyO3)
TypeScript type generation
Proper error handling and retry logic
Production hardening
Local model support (llama.cpp, etc.)

Contributing

This is an experimental project, so contributions are welcome but come with caveats:

Good first contributions:

Test one of the 242 OpenRouter models and report if it works
Add a new direct provider integration (not through OpenRouter)
Improve error handling for common failure modes
Fix the brittle template system
Add tests (there aren't many)

Before contributing:

Understand this is a research project, not production software
The codebase was mostly AI-generated and may have lurking issues
Check existing issues to see if someone's already working on it

If you still want to help:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make changes and test: make test && make lint
Submit a pull request with clear description of what you tested

See: 6. Development Guide for detailed workflows.

Should I Use This?

Use this if:

You want to experiment with the config-driven architecture idea
You're researching LLM client design patterns
You need quick access to OpenRouter's 242 models via one interface
You're okay debugging YAML templates when they break
You want to learn how SSE streaming works in Rust

Don't use this if:

You need a production-ready LLM client (use the official SDKs)
You want comprehensive error handling and retries
You need 100% stability and uptime
You're not comfortable with experimental software
You need Python/JS bindings (they don't exist yet)

Alternatives to consider:

Official SDKs: openai, anthropic, google-generativeai Python packages
LangChain / LlamaIndex: More mature, battle-tested frameworks
LiteLLM: Similar unified interface concept, but production-ready

License

This project is licensed under the terms specified in the LICENSE file.

Source Code

src/bin/cllient.rs - CLI implementation
src/runtime.rs - High-level Runtime API
src/client.rs - HTTP client and streaming
src/config.rs - Configuration system
config/ - Service and model configurations
docs/ - Complete documentation

Quick Links: Documentation | Examples | Configuration | Architecture

Disclaimer: This is experimental research software. For production use, consider LiteLLM or official provider SDKs.

Commit count: 0

cllient

documentation

README

cllient

Features

Limitations

Quick Start

Installation

Basic Usage

CLI Reference

Programmatic Usage

Streaming

Typed JSON Extraction

Documentation

Getting Started

Core Guides

Advanced Usage

Architecture

Configuration

What's Actually Available

Environment Variables

Adding Providers

Development

Build & Test

Model Management

Status

Contributing

Should I Use This?

License

Source Code

cargo fmt