| Crates.io | candle-pipelines |
| lib.rs | candle-pipelines |
| version | 0.0.7 |
| created_at | 2025-12-27 01:44:50.049466+00 |
| updated_at | 2026-01-05 00:05:45.209953+00 |
| description | Simple, intuitive pipelines for local LLM inference in Rust, powered by Candle. Inspired by Python's Transformers library. |
| homepage | |
| repository | https://github.com/ljt019/candle-pipelines/ |
| max_upload_size | |
| id | 2006565 |
| size | 511,705 |
[!warning] This crate is under active development. APIs may change as features are still being added, and things tweaked.
Simple, intuitive pipelines for local LLM inference in Rust, powered by Candle. API inspired by Python's Transformers.
Note: Currently, models are accessible through these pipelines only. Direct model interface coming eventually!
Generate text for various applications. Supports completions, tool calling, and token-by-token iteration.
Qwen3
Optimized for tool calling and structured output
Parameter Sizes:
├── 0.6B
├── 1.7B
├── 4B
├── 8B
├── 14B
└── 32B
Gemma3
Google's models for general language tasks
Parameter Sizes:
├── 1B
├── 4B
├── 12B
└── 27B
Llama 3.2 Meta's compact instruction-tuned models
Parameter Sizes:
├── 1B
└── 3B
OLMo-3 Allen AI's open language models with tool support
Parameter Sizes:
├── 7B
└── 32B
ModernBERT powers three specialized analysis tasks with shared architecture:
Complete missing words in text
Available Sizes:
├── Base
└── Large
Analyze emotional tone in multiple languages
Available Sizes:
├── Base
└── Large
Classify text without training examples
Available Sizes:
├── Base
└── Large
Technical Note: All ModernBERT pipelines share the same backbone architecture, loading task-specific finetuned weights as needed.
At this point in development the only way to interact with the models is through the given pipelines, I plan to eventually provide a simple interface to work with the models directly.
Inference will be quite slow at the moment, this is mostly due to not using the CUDA feature when compiling candle. I will be working on integrating this smoothly in future updates for much faster inference.
There are two basic ways to generate text:
Use the run method for straightforward text generation from a single prompt string.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3};
fn main() -> Result<()> {
// 1. Create the pipeline
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
.temperature(0.7)
.top_k(40)
.build()?;
// 2. Generate a completion - returns Output { text, stats }
let output = pipeline.run("What is the meaning of life?")?;
println!("{}", output.text);
println!("Generated {} tokens", output.stats.tokens_generated);
Ok(())
}
For more conversational interactions, you can pass a list of messages to the run method.
The Message struct represents a single message in a chat and has a role (system, user, assistant, or tool) and content. You can create messages using:
Message::system(content: &str): For system prompts.Message::user(content: &str): For user prompts.Message::assistant(content: &str): For model responses.Message::tool(content: &str): For tool/function results returned to the model.use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3, Message};
fn main() -> Result<()> {
// 1. Create the pipeline
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
.temperature(0.7)
.top_k(40)
.build()?;
// 2. Create the messages
let messages = vec![
Message::system("You are a helpful assistant."),
Message::user("What is the meaning of life?"),
];
// 3. Generate a completion
let output = pipeline.run(&messages)?;
println!("{}", output.text);
Ok(())
}
Using tools with models is also made extremely easy, you just define tools using the #[tool] macro, register them with the pipeline, and they're executed automatically when the model calls them.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{tool, tools, ErrorStrategy};
use candle_pipelines::text_generation::{Qwen3, TextGenerationPipelineBuilder};
// 1. Define tools using the #[tool] macro
#[tool(retries = 5)] // optional: configure retry attempts
/// Get the humidity for a given city.
fn get_humidity(city: String) -> Result<String> {
Ok(format!("The humidity is 50% in {}.", city))
}
#[tool] // defaults to 3 retries
/// Get the temperature for a given city in degrees celsius.
fn get_temperature(city: String) -> Result<String> {
Ok(format!("The temperature is 20 degrees celsius in {}.", city))
}
fn main() -> Result<()> {
// 2. Create the pipeline
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
.max_len(8192)
.tool_error_strategy(ErrorStrategy::ReturnToModel) // let model handle tool errors
.build()?;
// 3. Register tools (enabled by default)
pipeline.register_tools(tools![get_temperature, get_humidity]);
// 4. Get a completion - tools are used automatically
let output = pipeline.run("What's the temp and humidity like in Tokyo?")?;
println!("{}", output.text);
Ok(())
}
Tools can also be asynchronous, allowing you to perform network or file I/O directly inside the handler:
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::tool;
#[tool]
/// Echoes a message after waiting for a bit.
async fn delayed_echo(message: String) -> Result<String> {
tokio::time::sleep(std::time::Duration::from_millis(25)).await;
Ok(message)
}
Use run_iter to receive tokens as they're generated. Fully sync - no async runtime needed.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{TextGenerationPipelineBuilder, Qwen3};
use std::io::Write;
fn main() -> Result<()> {
// 1. Create the pipeline
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
.max_len(1024)
.build()?;
// 2. Iterate over tokens as they're generated
let mut tokens = pipeline.run_iter(
"Explain the concept of Large Language Models in simple terms.",
)?;
// 3. Print tokens as they arrive
for tok in &mut tokens {
print!("{}", tok?);
std::io::stdout().flush().unwrap();
}
// 4. Get stats after iteration
let stats = tokens.stats();
println!("\n\nGenerated {} tokens", stats.tokens_generated);
Ok(())
}
Use XmlParser to parse structured outputs from models - useful for reasoning traces like <think> blocks.
use candle_pipelines::error::Result;
use candle_pipelines::text_generation::{
Event, Qwen3, TagPart, TextGenerationPipelineBuilder, XmlTag,
};
// 1. Define which tags to parse using an enum
#[derive(Debug, Clone, PartialEq, XmlTag)]
enum Tags {
Think, // matches <think>
Answer, // matches <answer>
}
fn main() -> Result<()> {
// 2. Build a regular pipeline
let pipeline = TextGenerationPipelineBuilder::qwen3(Qwen3::Size0_6B)
.max_len(1024)
.build()?;
// 3. Create parser from tag enum
let parser = Tags::parser();
// 4. Get token iterator and wrap with XML parser
let tokens = pipeline.run_iter("Think step by step, then answer.")?;
let events = parser.parse_iter(tokens);
// 5. Process events using pattern matching
for event in events {
match event? {
Event::Tag { tag: Tags::Think, part } => match part {
TagPart::Opened { .. } => println!("[THINKING]"),
TagPart::Content { text } => print!("{}", text),
TagPart::Closed { .. } => println!("[END THINKING]"),
},
Event::Tag { tag: Tags::Answer, part } => match part {
TagPart::Content { text } => print!("{}", text),
_ => {}
},
Event::Content { text } => print!("{}", text),
}
}
Ok(())
}
The XML parser emits events as tags are encountered, enabling real-time processing without waiting for the full response.
use candle_pipelines::error::Result;
use candle_pipelines::fill_mask::{FillMaskPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
// 1. Build the pipeline
let pipeline = FillMaskPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
// 2. Fill the mask
let output = pipeline.run("The capital of France is [MASK].")?;
println!("{}: {:.2}", output.prediction.token, output.prediction.score);
// Output: Paris: 0.98
Ok(())
}
use candle_pipelines::error::Result;
use candle_pipelines::sentiment::{SentimentAnalysisPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
// 1. Build the pipeline
let pipeline = SentimentAnalysisPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
// 2. Analyze sentiment
let output = pipeline.run("I love using Rust for my projects!")?;
println!("Sentiment: {} (confidence: {:.2})", output.prediction.label, output.prediction.score);
// Output: Sentiment: positive (confidence: 0.98)
Ok(())
}
Zero-shot classification offers two methods for different use cases:
run)Use when you want to classify text into one of several mutually exclusive categories. Probabilities sum to 1.0.
use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
// 1. Build the pipeline
let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
// 2. Single-label classification
let text = "The Federal Reserve raised interest rates.";
let labels = &["economics", "politics", "technology", "sports"];
let output = pipeline.run(text, labels)?;
println!("Text: {}", text);
for p in &output.predictions {
println!("- {}: {:.4}", p.label, p.score);
}
// Example output (probabilities sum to 1.0):
// - economics: 0.8721
// - politics: 0.1134
// - technology: 0.0098
// - sports: 0.0047
Ok(())
}
run_multi_label)Use when labels can be independent and multiple labels could apply to the same text. Returns raw entailment probabilities.
use candle_pipelines::error::Result;
use candle_pipelines::zero_shot::{ZeroShotClassificationPipelineBuilder, ModernBertSize};
fn main() -> Result<()> {
// 1. Build the pipeline
let pipeline = ZeroShotClassificationPipelineBuilder::modernbert(ModernBertSize::Base).build()?;
// 2. Multi-label classification
let text = "I love reading books about machine learning and artificial intelligence.";
let labels = &["technology", "education", "reading", "science"];
let output = pipeline.run_multi_label(text, labels)?;
println!("Text: {}", text);
for p in &output.predictions {
println!("- {}: {:.4}", p.label, p.score);
}
// Example output (independent probabilities):
// - technology: 0.9234
// - education: 0.8456
// - reading: 0.9567
// - science: 0.7821
Ok(())
}