| Crates.io | ferrous-llm-ollama |
| lib.rs | ferrous-llm-ollama |
| version | 0.6.1 |
| created_at | 2025-07-12 06:16:15.97982+00 |
| updated_at | 2025-08-31 10:10:48.064775+00 |
| description | Ollama provider for the LLM library |
| homepage | https://www.eurora-labs.com |
| repository | https://github.com/eurora-labs/ferrous-llm.git |
| max_upload_size | |
| id | 1749024 |
| size | 154,814 |
Ollama provider implementation for the ferrous-llm ecosystem. This crate provides a complete implementation of Ollama's local API, including chat completions, text generation, streaming responses, and embeddings for locally-hosted language models.
Add this to your Cargo.toml:
[dependencies]
ferrous-llm-ollama = "0.2.0"
Or use the main ferrous-llm crate with the Ollama feature:
[dependencies]
ferrous-llm = { version = "0.2.0", features = ["ollama"] }
You need to have Ollama installed and running on your system:
Install Ollama: Visit ollama.ai and follow the installation instructions for your platform.
Start Ollama: Run the Ollama service:
ollama serve
Pull a model: Download a model to use:
ollama pull llama2
ollama pull codellama
ollama pull mistral
use ferrous_llm_ollama::{OllamaConfig, OllamaProvider};
use ferrous_llm_core::{ChatProvider, ChatRequest, Message, MessageContent, Role};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create configuration (no API key needed!)
let config = OllamaConfig::new("llama2");
let provider = OllamaProvider::new(config)?;
// Create a chat request
let request = ChatRequest {
messages: vec![
Message {
role: Role::User,
content: MessageContent::Text("Explain machine learning in simple terms".to_string()),
name: None,
tool_calls: None,
tool_call_id: None,
created_at: chrono::Utc::now(),
}
],
parameters: Default::default(),
metadata: Default::default(),
};
// Send the request
let response = provider.chat(request).await?;
println!("Llama2: {}", response.content());
Ok(())
}
use ferrous_llm_ollama::{OllamaConfig, OllamaProvider};
use ferrous_llm_core::{StreamingProvider, ChatRequest};
use futures::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = OllamaConfig::new("codellama");
let provider = OllamaProvider::new(config)?;
let request = ChatRequest {
// ... request setup
..Default::default()
};
let mut stream = provider.chat_stream(request).await?;
print!("CodeLlama: ");
while let Some(chunk) = stream.next().await {
match chunk {
Ok(data) => print!("{}", data.content()),
Err(e) => eprintln!("Stream error: {}", e),
}
}
println!();
Ok(())
}
Set these environment variables for automatic configuration:
export OLLAMA_MODEL="llama2" # Optional, defaults to llama2
export OLLAMA_BASE_URL="http://localhost:11434" # Optional, defaults to localhost:11434
export OLLAMA_EMBEDDING_MODEL="nomic-embed-text" # Optional, for embeddings
export OLLAMA_KEEP_ALIVE="300" # Optional, model keep-alive in seconds
use ferrous_llm_ollama::OllamaConfig;
use std::time::Duration;
// Simple configuration
let config = OllamaConfig::new("mistral");
// Using the builder pattern
let config = OllamaConfig::builder()
.model("codellama")
.embedding_model("nomic-embed-text")
.keep_alive(600) // Keep model loaded for 10 minutes
.timeout(Duration::from_secs(120))
.max_retries(3)
.build();
// From environment with validation
let config = OllamaConfig::from_env()?;
Connect to a remote Ollama server:
let config = OllamaConfig::builder()
.model("llama2")
.base_url("http://ollama-server:11434")?
.build();
Ollama supports a wide variety of models. Here are some popular ones:
llama2 - Meta's Llama 2 (7B, 13B, 70B variants)llama2:13b - Llama 2 13B parameter modelllama2:70b - Llama 2 70B parameter modelmistral - Mistral 7B modelmixtral - Mixtral 8x7B mixture of expertsneural-chat - Intel's neural chat modelcodellama - Code Llama for programming taskscodellama:python - Code Llama specialized for Pythonphind-codellama - Phind's fine-tuned Code Llamawizard-coder - WizardCoder modelnomic-embed-text - Nomic's text embedding modelall-minilm - Sentence transformer embedding model# List downloaded models
ollama list
# Pull a new model
ollama pull mistral:7b
# Remove a model
ollama rm old-model
use ferrous_llm_ollama::{OllamaConfig, OllamaProvider};
use ferrous_llm_core::{CompletionProvider, CompletionRequest};
let provider = OllamaProvider::new(config)?;
let request = CompletionRequest {
prompt: "Write a Python function to calculate fibonacci numbers:".to_string(),
parameters: Default::default(),
metadata: Default::default(),
};
let response = provider.complete(request).await?;
println!("Generated code:\n{}", response.content());
use ferrous_llm_ollama::{OllamaConfig, OllamaProvider};
use ferrous_llm_core::EmbeddingProvider;
let config = OllamaConfig::builder()
.model("llama2")
.embedding_model("nomic-embed-text")
.build();
let provider = OllamaProvider::new(config)?;
let texts = vec![
"The quick brown fox".to_string(),
"jumps over the lazy dog".to_string(),
];
let embeddings = provider.embed(&texts).await?;
for embedding in embeddings {
println!("Embedding dimension: {}", embedding.vector.len());
}
Configure model behavior with custom parameters:
use ferrous_llm_core::{ChatRequest, Parameters};
let request = ChatRequest {
messages: vec![/* ... */],
parameters: Parameters {
temperature: Some(0.8), // Creativity level (0.0 - 2.0)
top_p: Some(0.9), // Nucleus sampling
top_k: Some(40), // Top-k sampling
max_tokens: Some(500), // Maximum response length
stop_sequences: vec!["Human:".to_string()], // Stop generation at these sequences
..Default::default()
},
metadata: Default::default(),
};
Pass Ollama-specific options:
let config = OllamaConfig::builder()
.model("llama2")
.options(serde_json::json!({
"num_ctx": 4096, // Context window size
"num_predict": 256, // Number of tokens to predict
"repeat_penalty": 1.1, // Repetition penalty
"temperature": 0.7, // Temperature
"top_k": 40, // Top-k sampling
"top_p": 0.9 // Top-p sampling
}))
.build();
The crate provides comprehensive error handling:
use ferrous_llm_ollama::{OllamaError, OllamaProvider};
use ferrous_llm_core::ErrorKind;
match provider.chat(request).await {
Ok(response) => println!("Success: {}", response.content()),
Err(e) => match e.kind() {
ErrorKind::InvalidRequest => eprintln!("Invalid request: {}", e),
ErrorKind::ServerError => eprintln!("Ollama server error: {}", e),
ErrorKind::NetworkError => eprintln!("Network error - is Ollama running?"),
ErrorKind::Timeout => eprintln!("Request timeout"),
_ => eprintln!("Unknown error: {}", e),
}
}
Run the test suite:
# Unit tests
cargo test
# Integration tests (requires Ollama running)
cargo test --test integration_tests
Note: Integration tests require Ollama to be running with at least the llama2 model available.
See the examples directory for complete working examples:
ollama_chat.rs - Basic chat exampleollama_chat_streaming.rs - Streaming chat exampleRun examples:
# Make sure Ollama is running and has the model
ollama pull llama2
cargo run --example ollama_chat --features ollama
keep_alive to keep models in memory between requestslet config = OllamaConfig::builder()
.model("llama2:7b") // Use smaller model for speed
.keep_alive(1800) // Keep model loaded for 30 minutes
.timeout(Duration::from_secs(60)) // Reasonable timeout
.build();
Connection Refused
Error: Network error - is Ollama running?
ollama serveModel Not Found
Error: model 'llama2' not found
ollama pull llama2ollama listOut of Memory
Error: not enough memory to load model
llama2:7b instead of llama2:70b)Enable debug logging to troubleshoot issues:
use tracing_subscriber;
#[tokio::main]
async fn main() {
tracing_subscriber::fmt::init();
// Your code here
}
| Model | Size | Use Case | Speed | Quality |
|---|---|---|---|---|
| llama2:7b | ~4GB | General chat, fast responses | Fast | Good |
| llama2:13b | ~7GB | Better reasoning | Medium | Better |
| llama2:70b | ~40GB | Complex tasks | Slow | Best |
| codellama | ~4GB | Code generation | Fast | Good for code |
| mistral | ~4GB | Efficient general purpose | Fast | Good |
| mixtral | ~26GB | High-quality responses | Medium | Excellent |
This crate is part of the ferrous-llm workspace. See the main repository for contribution guidelines.
Licensed under the Apache License 2.0. See LICENSE for details.