openllm-rs

Crates.ioopenllm-rs
lib.rsopenllm-rs
version0.1.0
created_at2026-01-18 01:59:08.506852+00
updated_at2026-01-18 01:59:08.506852+00
descriptionA Rust crate that enables seamless interaction with Llama.cpp backend services through an OpenAI-compatible REST APIe
homepage
repositoryhttps://github.com/liuzhi19121999/openllm-rs
max_upload_size
id2051580
size68,818
(liuzhi19121999)

documentation

README

openllm-rs - Rust Library for OpenAI-like Llama.cpp Remote Inference

📦 Overview

openllm-rs is a Rust crate that enables seamless interaction with Llama.cpp backend services through an OpenAI-compatible REST API. It provides a high-level abstraction for managing chat sessions, MCP tool integration, and customizable inference parameters.

✅ Key Features

  • 🗣 Multi-turn conversation management with history caching
  • 🧰 MCP tool integration for external function calls
  • 📌 Customizable system prompts and temperature control
  • 🔄 Automatic health check before inference
  • 📊 Configurable maximum context length (up to 7000 tokens)
  • 🔄 Asynchronous request handling with tokio runtime

📦 Installation

Add this to your Cargo.toml:

[dependencies]
openllm-rs = "0.1.0"

📘 Usage Example

use openllm_rs::{LLMs, mcp::MCP};
use std::vec;

async fn it_works() {
    // Initialize LLM client with server configuration
    let mut llm = LLMs::new();
    llm.set_scheme("http".to_string())
       .set_host("192.168.1.19".to_string())
       .set_port("7800".to_string())
       .set_temperature(0.8)
       .set_need_health_check(true)
       .set_max_length(7000)
       .set_cache_history(true);

    // Configure MCP tools for function calling
    let mcp0 = MCP::new("http://127.0.0.1:7055", "HowToCook");
    let mcp1 = MCP::new("http://127.0.0.1:8001", "Calculate");
    
    llm.set_mcp_names(vec!["HowToCook".to_string(), "Calculate".to_string()])
       .set_mcps(vec![mcp0, mcp1]).await;

    // Example conversation flow
    let mut resp = llm.chat_request("Hello".to_string()).await;
    println!("{}", resp);
    
    resp = llm.chat_request("Today is June 12, 2028. Remember this key information".to_string()).await;
    println!("{}", resp);
    
    // Test memory retention and tool calling
    resp = llm.chat_request("What date is today?".to_string()).await;
    println!("{}", resp);
    
    resp = llm.chat_request("1.5 + 2 equals what?".to_string()).await;
    println!("{}", resp);
    
    resp = llm.chat_request("How to make tomato scrambled eggs?".to_string()).await;
    println!("{}", resp);
}

🧠 Technical Details

The library implements a client-server architecture where:

  • LLMs::new() creates a client instance with default configuration
  • set_* methods configure connection parameters and inference settings
  • chat_request() handles both regular text queries and tool calling requests
  • MCP tools are registered through the set_mcps() method

⚠️ Requirements

  • Llama.cpp server running on the specified host:port
  • Valid OpenAI-compatible API endpoint
  • Tokio runtime for asynchronous operations

📌 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

📜 License

Apache License 2.0 - See LICENSE file

📚 Documentation

API reference documentation is available at https://docs.openllm.rs

Commit count: 4

cargo fmt