| Crates.io | metal-candle |
| lib.rs | metal-candle |
| version | 1.3.0 |
| created_at | 2025-12-10 23:03:40.934355+00 |
| updated_at | 2025-12-18 19:16:38.939605+00 |
| description | Production-quality Rust ML crate for Apple Silicon - LoRA training, inference, and text generation using Candle with Metal backend |
| homepage | https://github.com/GarthDB/metal-candle |
| repository | https://github.com/GarthDB/metal-candle |
| max_upload_size | |
| id | 1978907 |
| size | 1,938,798 |
Production-quality Rust ML library for Apple Silicon - LoRA training, text generation, and semantic embeddings
Pure Rust machine learning library optimized for Apple Silicon:
Why metal-candle? 25.9x faster than MLX for embeddings, single binary deployment, type-safe ML, production-ready (407 tests, 81.6% coverage)
metal-candle demonstrates exceptional performance on Apple Silicon:
| Task | Batch Size | metal-candle | MLX | Speedup |
|---|---|---|---|---|
| Embeddings | 100 docs | 4.4ms | 113.5ms | 25.9x π |
| Embeddings | Single query | 3.9ms | 7.7ms | 2.0x |
| Throughput | - | 22,831 docs/sec | 881 docs/sec | 25.9x |
Near constant-time performance: Batch 1β100 only increases by 13% (3.9ms β 4.4ms)
See BENCHMARKS.md for detailed performance analysis and methodology.
[dependencies]
metal-candle = "1.2" # or latest from crates.io
Requirements: Rust 1.75+, Apple Silicon (M1/M2/M3/M4), macOS 12.0+
use metal_candle::inference::{Generator, GeneratorConfig, SamplingStrategy};
use metal_candle::models::Qwen;
// Load model
let model = Qwen::new(&config, vb)?;
// Configure generation
let gen_config = GeneratorConfig {
max_tokens: 128,
sampling: SamplingStrategy::TopP { p: 0.95 },
temperature: 0.7,
repetition_penalty: 1.1, // Reduce repetition
stop_on_eos: true,
eos_token_id: Some(151643), // Qwen EOS token
..Default::default()
};
// Generate tokens
let mut generator = Generator::new(Box::new(model), gen_config)?;
let output_ids = generator.generate(&input_ids)?;
// Or use streaming for real-time generation (v1.3.0+)
generator.generate_stream(&input_ids, |token| {
println!("Token {}: prob={:.2}%", token.token_id, token.probability * 100.0);
true // Continue generation
})?;
// Async streaming (requires 'streaming' feature)
#[cfg(feature = "streaming")]
{
use futures::stream::StreamExt;
use futures::pin_mut;
let stream = generator.generate_stream_async(&input_ids);
pin_mut!(stream);
while let Some(result) = stream.next().await {
let token = result?;
println!("Token: {}", token.token_id);
}
}
use metal_candle::embeddings::{EmbeddingModel, EmbeddingModelType};
use metal_candle::Device;
// Load embedding model with Metal acceleration (25.9x faster than MLX!)
let device = Device::new_metal(0)?;
let model = EmbeddingModel::from_pretrained(
EmbeddingModelType::E5SmallV2,
device,
)?;
// Generate embeddings for semantic search
let texts = vec![
"Rust is a systems programming language",
"Python is a high-level language",
];
let embeddings = model.encode(&texts)?; // [batch, 384] in 3.9ms
// Batch processing: 100 docs in 4.4ms (22,831 docs/sec throughput)
let large_corpus = load_documents()?;
let batch_embeddings = model.encode(&large_corpus)?;
use metal_candle::training::{
LoRAAdapter, LoRAAdapterConfig, TargetModule,
Trainer, TrainingConfig, LRScheduler
};
// Create LoRA adapter
let lora_config = LoRAAdapterConfig {
rank: 8,
alpha: 16.0,
dropout: 0.0,
target_modules: vec![TargetModule::QProj, TargetModule::VProj],
};
let adapter = LoRAAdapter::new(&model, lora_config, &device)?;
// Configure and train
let training_config = TrainingConfig {
num_epochs: 3,
lr_scheduler: LRScheduler::warmup_cosine(100, 1000, 1e-4, 1e-6),
..Default::default()
};
let trainer = Trainer::new(adapter, training_config)?;
let metrics = trainer.train(&dataset)?;
use metal_candle::training::{AdapterRegistry, LoRAAdapter, LoRAAdapterConfig};
// Create registry for managing multiple adapters
let mut registry = AdapterRegistry::new();
// Load task-specific adapters
let code_adapter = LoRAAdapter::new(768, 3072, 12, &config, &device)?;
let chat_adapter = LoRAAdapter::new(768, 3072, 12, &config, &device)?;
registry.add_adapter("code-assistant".to_string(), code_adapter)?;
registry.add_adapter("chat".to_string(), chat_adapter)?;
// Switch between adapters without reloading base model
registry.activate("code-assistant")?;
// ... use model for code generation ...
registry.activate("chat")?;
// ... use model for chat ...
// Memory efficient: adapters are ~0.03% of base model size
println!("Active adapter: {:?}", registry.active_adapter());
Built on Candle with Metal backend:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β metal-candle (Public API) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Training β Inference β Models β
β β’ LoRAAdapter β β’ KVCache β β’ ModelLoader β
β β’ Trainer β β’ Sampling β β’ Qwen β
β β’ AdamW β β’ Generator β β’ Config β
β β’ Schedulers β β β
β β’ Checkpoint β Embeddings β β
β β β’ EmbeddingModel β β
β β β’ E5/MiniLM/MPNetβ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Candle Framework β
β β’ Tensor operations β’ Metal backend β’ Autograd β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Apple Metal API β
β (GPU acceleration on Apple Silicon) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
See ARCHITECTURE.md for detailed architecture documentation.
| Example | Description |
|---|---|
generate_text.rs |
Text generation with streaming and sampling |
train_lora.rs |
End-to-end LoRA training |
embeddings_demo.rs |
Semantic search with embeddings |
inference_demo.rs |
KV-cache and sampling demo |
load_model.rs |
Model loading and inspection |
Run examples:
cargo run --example generate_text
cargo run --example train_lora
cargo run --example embeddings_demo --features embeddings
git clone https://github.com/GarthDB/metal-candle.git
cd metal-candle
cargo build && cargo test
See CONTRIBUTING.md for full guidelines. Quality standards: zero clippy warnings (pedantic), β₯80% coverage, 100% API docs.
See ROADMAP.md for detailed release plans and NEXT_STEPS.md for immediate priorities.
Track progress on the v1.3+ Feature Roadmap project board. Vote with π on issues you'd like to see prioritized!
Contributions welcome! See CONTRIBUTING.md for development standards and testing requirements.
Licensed under Apache-2.0 (LICENSE). Provides explicit patent protection for production ML.
Two unmaintained transitive dependencies (non-security): number_prefix, paste from trusted upstream (Candle, HF). See deny.toml for details.
Maintained by: @GarthDB