omniference

Crates.ioomniference
lib.rsomniference
version0.1.3
created_at2026-01-10 18:20:38.494936+00
updated_at2026-01-13 22:36:40.937965+00
descriptionA multi-protocol inference engine with provider adapters
homepage
repositoryhttps://github.com/Henrik-3/omniference
max_upload_size
id2034516
size595,033
Henrik Mertens (Henrik-3)

documentation

README

Omniference

A flexible, multi-protocol inference engine that provides a unified interface for interacting with various AI model providers such as Ollama, OpenAI, and others through a common API.

Features

  • Multi-provider support: Ollama, OpenAI, and extensible architecture for more providers
  • Streaming support: Real-time streaming responses from AI models
  • OpenAI-compatible API: Drop-in replacement for OpenAI's API
  • Multiple interfaces: HTTP server, Discord bot, CLI, library usage
  • Embeddable: Can be integrated into existing Axum applications
  • Async/await: Built on Tokio for high-performance async operations
  • Type-safe: Strong typing throughout the library

Architecture

The library is organized in layers:

  • Core Layer: Router, adapters, and types (pure inference logic)
  • Service Layer: Provider management and model resolution
  • Interface Layer: HTTP APIs, Discord bot, CLI, etc.
  • Application Layer: Full server or embeddable components

Quick Start

Installation

Add to your Cargo.toml:

[dependencies]
omniference = "0.1.0"

Usage Examples

1. Library Usage

Use Omniference as a library in any async context:

use omniference::{OmniferenceEngine, types::{ProviderConfig, ProviderKind, ProviderEndpoint}};
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create engine
    let mut engine = OmniferenceEngine::new();
    
    // Register provider
    engine.register_provider(ProviderConfig {
        name: "ollama".to_string(),
        endpoint: ProviderEndpoint {
            kind: ProviderKind::Ollama,
            base_url: "http://localhost:11434".to_string(),
            api_key: None,
            extra_headers: std::collections::BTreeMap::new(),
            timeout: Some(30000),
        },
        enabled: true,
    }).await?;
    
    // Create chat request
    let request = omniference::types::ChatRequestIR {
        model: omniference::types::ModelRef {
            alias: "llama3.2".to_string(),
            provider: engine.list_models().await[0].provider.clone(),
        },
        messages: vec![
            omniference::types::Message {
                role: "user".to_string(),
                content: "Hello! How are you?".to_string(),
            },
        ],
        metadata: std::collections::HashMap::new(),
        stream: false,
    };
    
    // Execute chat
    let response = engine.chat_complete(request).await?;
    println!("Response: {}", response);
    
    Ok(())
}

2. Standalone HTTP Server

Run as a standalone HTTP server with OpenAI-compatible API:

use omniference::{server::OmniferenceServer, types::{ProviderConfig, ProviderKind, ProviderEndpoint}};
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut server = OmniferenceServer::new();
    
    server.add_provider(ProviderConfig {
        name: "ollama".to_string(),
        endpoint: ProviderEndpoint {
            kind: ProviderKind::Ollama,
            base_url: "http://localhost:11434".to_string(),
            api_key: None,
            extra_headers: std::collections::BTreeMap::new(),
            timeout: Some(30000),
        },
        enabled: true,
    }).await?;
    
    server.run("0.0.0.0:8080").await
}

3. Embedded in Existing Axum Application

Integrate into an existing Axum application:

use axum::{routing::get, Router};
use omniference::{server::OmniferenceServer, types::{ProviderConfig, ProviderKind, ProviderEndpoint}};
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Your existing app
    let app = Router::new()
        .route("/", get(|| async { "Welcome to my app!" }));
    
    // Add Omniference
    let mut omniference_server = OmniferenceServer::new();
    
    omniference_server.add_provider(ProviderConfig {
        name: "ollama".to_string(),
        endpoint: ProviderEndpoint {
            kind: ProviderKind::Ollama,
            base_url: "http://localhost:11434".to_string(),
            api_key: None,
            extra_headers: std::collections::BTreeMap::new(),
            timeout: Some(30000),
        },
        enabled: true,
    }).await?;
    
    // Mount Omniference under /ai
    let app = app.nest("/ai", omniference_server.app());
    
    // Run combined app
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
    axum::serve(listener, app).await?;
    
    Ok(())
}

4. Discord Bot Integration

Create a Discord bot with AI capabilities

[dependencies]
omniference = { version = "0.1.0" }
use omniference::{OmniferenceEngine, types::{ProviderConfig, ProviderKind, ProviderEndpoint}};
use std::sync::Arc;

// Set up engine in your Discord bot handler
let mut engine = OmniferenceEngine::new();
engine.register_provider(ProviderConfig {
    // ... provider configuration
}).await?;

// Use in message handlers
let response = engine.chat_complete(request).await?;

Examples

The crate includes several examples:

  • cargo run --example library_usage - Basic library usage
  • cargo run --example embedded_axum - Embed in existing Axum app
  • cargo run --example standalone_server - Run as standalone server
  • cargo run --example discord_bot - Discord bot integration

API Endpoints

When running as a server, Omniference provides:

  • POST /api/openai/v1/responses - OpenAI Responses API (new, OpenAI-only)
  • GET /api/openai/v1/models - List available models
  • POST /api/openai-compatible/v1/chat/completions - OpenAI-compatible Chat Completions
  • GET /api/openai-compatible/v1/models - OpenAI-compatible models endpoint

Configuration

Examples via .env

Examples read configuration from environment variables. Copy .env.example to .env and set values as needed:

cp .env.example .env
# then edit .env

Key variables:

  • OLLAMA_BASE_URL (default http://localhost:11434)
  • OPENAI_BASE_URL (default https://api.openai.com)
  • OPENAI_API_KEY (required for OpenAI-compatible examples)
  • DISCORD_TOKEN (required for the Discord example)
  • SERVER_ADDR and EMBEDDED_SERVER_ADDR to change example ports

Provider Configuration

Configure providers with custom endpoints and settings:

ProviderConfig {
    name: "my-ollama".to_string(),
    endpoint: ProviderEndpoint {
        kind: ProviderKind::Ollama,
        base_url: "http://custom-server:11434".to_string(),
        api_key: Some("your-api-key".to_string()),
        extra_headers: {
            let mut headers = std::collections::BTreeMap::new();
            headers.insert("Custom-Header".to_string(), "value".to_string());
            headers
        },
        timeout: Some(60000),
    },
    enabled: true,
}

Model Resolution

Models are auto-discovered from providers and can be referenced using:

  • Provider-prefixed format: ollama/llama3.2
  • Direct model names: llama3.2
  • Custom aliases configured by your application

Building and Testing

# Build the library
cargo build

# Run tests
cargo test

# Run examples
cargo run --example library_usage
cargo run --example standalone_server

# Run with Discord support
cargo run --example discord_bot --features discord

Features

  • default: Core functionality without optional dependencies
  • discord: Enables Discord bot integration with Serenity

License

MIT - see LICENSE for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Roadmap

  • Support for more providers (OpenAI, Anthropic, etc.)
  • Additional protocol skins (Anthropic Messages API, etc.)
  • Advanced configuration management
  • Performance optimizations and caching
  • Monitoring and metrics
  • Load balancing and failover
Commit count: 0

cargo fmt