| Crates.io | lancor |
| lib.rs | lancor |
| version | 0.1.1 |
| created_at | 2025-11-18 08:31:33.962943+00 |
| updated_at | 2025-11-18 09:14:10.933782+00 |
| description | Rust client for llama.cpp's OpenAI compatible API server |
| homepage | https://github.com/dirmacs/lancor |
| repository | https://github.com/dirmacs/lancor |
| max_upload_size | |
| id | 1938052 |
| size | 103,110 |
A Rust client library for llama.cpp's OpenAI-compatible API server.
Add this to your Cargo.toml:
[dependencies]
lancor = "0.1.0"
tokio = { version = "1.0", features = ["full"] }
use lancor::{LlamaCppClient, ChatCompletionRequest, Message};
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
// Create a client
let client = LlamaCppClient::new("http://localhost:8080")?;
// Build a chat completion request
let request = ChatCompletionRequest::new("your-model-name")
.message(Message::system("You are a helpful assistant."))
.message(Message::user("What is Rust?"))
.max_tokens(100);
// Send the request
let response = client.chat_completion(request).await?;
println!("{}", response.choices[0].message.content);
Ok(())
}
use lancor::{LlamaCppClient, ChatCompletionRequest, Message};
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = ChatCompletionRequest::new("model-name")
.message(Message::system("You are a helpful assistant."))
.message(Message::user("Explain quantum computing"))
.temperature(0.7)
.max_tokens(200);
let response = client.chat_completion(request).await?;
println!("{}", response.choices[0].message.content);
use lancor::{LlamaCppClient, ChatCompletionRequest, Message};
use futures::stream::StreamExt;
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = ChatCompletionRequest::new("model-name")
.message(Message::user("Write a short poem"))
.stream(true)
.max_tokens(100);
let mut stream = client.chat_completion_stream(request).await?;
while let Some(chunk_result) = stream.next().await {
if let Ok(chunk) = chunk_result {
if let Some(content) = &chunk.choices[0].delta.content {
print!("{}", content);
}
}
}
use lancor::{LlamaCppClient, CompletionRequest};
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = CompletionRequest::new("model-name", "Once upon a time")
.max_tokens(50)
.temperature(0.8);
let response = client.completion(request).await?;
println!("{}", response.content);
use lancor::{LlamaCppClient, EmbeddingRequest};
let client = LlamaCppClient::new("http://localhost:8080")?;
let request = EmbeddingRequest::new("model-name", "Hello, world!");
let response = client.embedding(request).await?;
let embedding_vector = &response.data[0].embedding;
println!("Embedding dimension: {}", embedding_vector.len());
use lancor::LlamaCppClient;
// With API key
let client = LlamaCppClient::with_api_key(
"http://localhost:8080",
"your-api-key"
)?;
LlamaCppClientThe main client for interacting with llama.cpp server.
new(base_url) - Create a new clientwith_api_key(base_url, api_key) - Create a client with API key authenticationdefault() - Create a client connecting to http://localhost:8080chat_completion(request) - Send a chat completion requestchat_completion_stream(request) - Send a streaming chat completion requestcompletion(request) - Send a text completion requestembedding(request) - Send an embedding requestAll request types support a fluent builder pattern:
ChatCompletionRequest::new("model")
.message(Message::user("Hello"))
.temperature(0.7)
.max_tokens(100)
.top_p(0.9)
.stream(true);
To use this client, you need to run llama.cpp with the --api-key flag (optional) and ensure the OpenAI-compatible endpoints are enabled:
./server -m your-model.gguf --port 8080
Check out the examples directory for more usage examples:
cargo run --example basic_usage
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.