metal-candle

Crates.iometal-candle
lib.rsmetal-candle
version1.3.0
created_at2025-12-10 23:03:40.934355+00
updated_at2025-12-18 19:16:38.939605+00
descriptionProduction-quality Rust ML crate for Apple Silicon - LoRA training, inference, and text generation using Candle with Metal backend
homepagehttps://github.com/GarthDB/metal-candle
repositoryhttps://github.com/GarthDB/metal-candle
max_upload_size
id1978907
size1,938,798
Garth Braithwaite (GarthDB)

documentation

https://docs.rs/metal-candle

README

metal-candle

Crates.io docs.rs CI codecov License Rust

Production-quality Rust ML library for Apple Silicon - LoRA training, text generation, and semantic embeddings

Overview

Pure Rust machine learning library optimized for Apple Silicon:

  • LoRA Training: Fine-tune transformer models efficiently
  • Text Generation: Streaming, multiple sampling strategies, repetition penalty
  • Semantic Embeddings: E5, MiniLM, MPNet models for RAG and search
  • Metal Acceleration: Native GPU acceleration on M-series chips

Why metal-candle? 25.9x faster than MLX for embeddings, single binary deployment, type-safe ML, production-ready (407 tests, 81.6% coverage)

Performance

metal-candle demonstrates exceptional performance on Apple Silicon:

Task Batch Size metal-candle MLX Speedup
Embeddings 100 docs 4.4ms 113.5ms 25.9x πŸš€
Embeddings Single query 3.9ms 7.7ms 2.0x
Throughput - 22,831 docs/sec 881 docs/sec 25.9x

Near constant-time performance: Batch 1β†’100 only increases by 13% (3.9ms β†’ 4.4ms)

See BENCHMARKS.md for detailed performance analysis and methodology.

Installation

[dependencies]
metal-candle = "1.2"  # or latest from crates.io

Requirements: Rust 1.75+, Apple Silicon (M1/M2/M3/M4), macOS 12.0+

Quick Start

Text Generation

use metal_candle::inference::{Generator, GeneratorConfig, SamplingStrategy};
use metal_candle::models::Qwen;

// Load model
let model = Qwen::new(&config, vb)?;

// Configure generation
let gen_config = GeneratorConfig {
    max_tokens: 128,
    sampling: SamplingStrategy::TopP { p: 0.95 },
    temperature: 0.7,
    repetition_penalty: 1.1,  // Reduce repetition
    stop_on_eos: true,
    eos_token_id: Some(151643),  // Qwen EOS token
    ..Default::default()
};

// Generate tokens
let mut generator = Generator::new(Box::new(model), gen_config)?;
let output_ids = generator.generate(&input_ids)?;

// Or use streaming for real-time generation (v1.3.0+)
generator.generate_stream(&input_ids, |token| {
    println!("Token {}: prob={:.2}%", token.token_id, token.probability * 100.0);
    true // Continue generation
})?;

// Async streaming (requires 'streaming' feature)
#[cfg(feature = "streaming")]
{
    use futures::stream::StreamExt;
    use futures::pin_mut;
    
    let stream = generator.generate_stream_async(&input_ids);
    pin_mut!(stream);
    
    while let Some(result) = stream.next().await {
        let token = result?;
        println!("Token: {}", token.token_id);
    }
}

Semantic Embeddings (RAG & Search)

use metal_candle::embeddings::{EmbeddingModel, EmbeddingModelType};
use metal_candle::Device;

// Load embedding model with Metal acceleration (25.9x faster than MLX!)
let device = Device::new_metal(0)?;
let model = EmbeddingModel::from_pretrained(
    EmbeddingModelType::E5SmallV2,
    device,
)?;

// Generate embeddings for semantic search
let texts = vec![
    "Rust is a systems programming language",
    "Python is a high-level language",
];
let embeddings = model.encode(&texts)?;  // [batch, 384] in 3.9ms

// Batch processing: 100 docs in 4.4ms (22,831 docs/sec throughput)
let large_corpus = load_documents()?;
let batch_embeddings = model.encode(&large_corpus)?;

LoRA Training

use metal_candle::training::{
    LoRAAdapter, LoRAAdapterConfig, TargetModule,
    Trainer, TrainingConfig, LRScheduler
};

// Create LoRA adapter
let lora_config = LoRAAdapterConfig {
    rank: 8,
    alpha: 16.0,
    dropout: 0.0,
    target_modules: vec![TargetModule::QProj, TargetModule::VProj],
};
let adapter = LoRAAdapter::new(&model, lora_config, &device)?;

// Configure and train
let training_config = TrainingConfig {
    num_epochs: 3,
    lr_scheduler: LRScheduler::warmup_cosine(100, 1000, 1e-4, 1e-6),
    ..Default::default()
};
let trainer = Trainer::new(adapter, training_config)?;
let metrics = trainer.train(&dataset)?;

LoRA Adapter Management (v1.3.0+)

use metal_candle::training::{AdapterRegistry, LoRAAdapter, LoRAAdapterConfig};

// Create registry for managing multiple adapters
let mut registry = AdapterRegistry::new();

// Load task-specific adapters
let code_adapter = LoRAAdapter::new(768, 3072, 12, &config, &device)?;
let chat_adapter = LoRAAdapter::new(768, 3072, 12, &config, &device)?;

registry.add_adapter("code-assistant".to_string(), code_adapter)?;
registry.add_adapter("chat".to_string(), chat_adapter)?;

// Switch between adapters without reloading base model
registry.activate("code-assistant")?;
// ... use model for code generation ...

registry.activate("chat")?;
// ... use model for chat ...

// Memory efficient: adapters are ~0.03% of base model size
println!("Active adapter: {:?}", registry.active_adapter());

Features

  • Training: LoRA with dropout, AdamW optimizer, learning rate schedulers, checkpoint management, adapter registry (v1.3.0+)
  • Inference: KV-cache, multiple sampling strategies, streaming generation (sync & async), repetition penalty, rich token metadata (v1.3.0+)
  • Models: Qwen2.5-Coder, safetensors format, transformer components (RoPE, GQA, MLP)
  • Embeddings: E5, MiniLM, MPNet with HuggingFace Hub integration
  • Quality: 407 tests, 81.6% coverage, strict clippy linting, 100% API documentation

Architecture

Built on Candle with Metal backend:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    metal-candle (Public API)                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Training          β”‚  Inference        β”‚  Models            β”‚
β”‚  β€’ LoRAAdapter     β”‚  β€’ KVCache        β”‚  β€’ ModelLoader     β”‚
β”‚  β€’ Trainer         β”‚  β€’ Sampling       β”‚  β€’ Qwen           β”‚
β”‚  β€’ AdamW           β”‚  β€’ Generator      β”‚  β€’ Config          β”‚
β”‚  β€’ Schedulers      β”‚                   β”‚                    β”‚
β”‚  β€’ Checkpoint      β”‚  Embeddings       β”‚                    β”‚
β”‚                    β”‚  β€’ EmbeddingModel β”‚                    β”‚
β”‚                    β”‚  β€’ E5/MiniLM/MPNetβ”‚                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Candle Framework                        β”‚
β”‚  β€’ Tensor operations  β€’ Metal backend  β€’ Autograd           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Apple Metal API                         β”‚
β”‚  (GPU acceleration on Apple Silicon)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

See ARCHITECTURE.md for detailed architecture documentation.

Documentation

Examples

Example Description
generate_text.rs Text generation with streaming and sampling
train_lora.rs End-to-end LoRA training
embeddings_demo.rs Semantic search with embeddings
inference_demo.rs KV-cache and sampling demo
load_model.rs Model loading and inspection

Run examples:

cargo run --example generate_text
cargo run --example train_lora
cargo run --example embeddings_demo --features embeddings

Development

git clone https://github.com/GarthDB/metal-candle.git
cd metal-candle
cargo build && cargo test

See CONTRIBUTING.md for full guidelines. Quality standards: zero clippy warnings (pedantic), β‰₯80% coverage, 100% API docs.

Roadmap

See ROADMAP.md for detailed release plans and NEXT_STEPS.md for immediate priorities.

Upcoming Releases

Track progress on the v1.3+ Feature Roadmap project board. Vote with πŸ‘ on issues you'd like to see prioritized!

Contributing

Contributions welcome! See CONTRIBUTING.md for development standards and testing requirements.

License

Licensed under Apache-2.0 (LICENSE). Provides explicit patent protection for production ML.

Acknowledgments

Known Advisories

Two unmaintained transitive dependencies (non-security): number_prefix, paste from trusted upstream (Candle, HF). See deny.toml for details.

Support


Maintained by: @GarthDB

Commit count: 0

cargo fmt