openllm-rs

Crates.io	openllm-rs
lib.rs	openllm-rs
version	0.1.0
created_at	2026-01-18 01:59:08.506852+00
updated_at	2026-01-18 01:59:08.506852+00
description	A Rust crate that enables seamless interaction with Llama.cpp backend services through an OpenAI-compatible REST APIe
homepage
repository	https://github.com/liuzhi19121999/openllm-rs
max_upload_size
id	2051580
size	68,818

(liuzhi19121999)

documentation

README

openllm-rs - Rust Library for OpenAI-like Llama.cpp Remote Inference

📦 Overview

openllm-rs is a Rust crate that enables seamless interaction with Llama.cpp backend services through an OpenAI-compatible REST API. It provides a high-level abstraction for managing chat sessions, MCP tool integration, and customizable inference parameters.

✅ Key Features

🗣 Multi-turn conversation management with history caching
🧰 MCP tool integration for external function calls
📌 Customizable system prompts and temperature control
🔄 Automatic health check before inference
📊 Configurable maximum context length (up to 7000 tokens)
🔄 Asynchronous request handling with tokio runtime

📦 Installation

Add this to your Cargo.toml:

[dependencies]
openllm-rs = "0.1.0"

📘 Usage Example

use openllm_rs::{LLMs, mcp::MCP};
use std::vec;

async fn it_works() {
    // Initialize LLM client with server configuration
    let mut llm = LLMs::new();
    llm.set_scheme("http".to_string())
       .set_host("192.168.1.19".to_string())
       .set_port("7800".to_string())
       .set_temperature(0.8)
       .set_need_health_check(true)
       .set_max_length(7000)
       .set_cache_history(true);

    // Configure MCP tools for function calling
    let mcp0 = MCP::new("http://127.0.0.1:7055", "HowToCook");
    let mcp1 = MCP::new("http://127.0.0.1:8001", "Calculate");
    
    llm.set_mcp_names(vec!["HowToCook".to_string(), "Calculate".to_string()])
       .set_mcps(vec![mcp0, mcp1]).await;

    // Example conversation flow
    let mut resp = llm.chat_request("Hello".to_string()).await;
    println!("{}", resp);
    
    resp = llm.chat_request("Today is June 12, 2028. Remember this key information".to_string()).await;
    println!("{}", resp);
    
    // Test memory retention and tool calling
    resp = llm.chat_request("What date is today?".to_string()).await;
    println!("{}", resp);
    
    resp = llm.chat_request("1.5 + 2 equals what?".to_string()).await;
    println!("{}", resp);
    
    resp = llm.chat_request("How to make tomato scrambled eggs?".to_string()).await;
    println!("{}", resp);
}

🧠 Technical Details

The library implements a client-server architecture where:

LLMs::new() creates a client instance with default configuration
set_* methods configure connection parameters and inference settings
chat_request() handles both regular text queries and tool calling requests
MCP tools are registered through the set_mcps() method

⚠️ Requirements

Llama.cpp server running on the specified host:port
Valid OpenAI-compatible API endpoint
Tokio runtime for asynchronous operations

📌 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

📜 License

Apache License 2.0 - See LICENSE file

📚 Documentation

API reference documentation is available at https://docs.openllm.rs

Commit count: 4