| Crates.io | canary-rs |
| lib.rs | canary-rs |
| version | 0.2.0 |
| created_at | 2026-01-16 11:55:16.736503+00 |
| updated_at | 2026-01-16 14:37:32.316031+00 |
| description | Rust implementation of NVIDIA Canary ASR/AST using ONNX Runtime |
| homepage | |
| repository | https://github.com/mmende/canary-rs |
| max_upload_size | |
| id | 2048482 |
| size | 127,362 |
A Rust implementation for NVIDIA's Canary multilingual ASR/AST model using ONNX Runtime.
Download Canary-1b-v2 or Canary-180m-flash model files from HuggingFace:
encoder-model.onnxencoder-model.onnx.data (Canary-1b-v2 only)decoder-model.onnxvocab.txtor for int8 quantization:
encoder-model.int8.onnxdecoder-model.int8.onnxvocab.txtand place them in a directory, e.g., canary-1b-v2.
use canary_rs::{Canary, StreamConfig};
let model = Canary::from_pretrained("canary-1b-v2", None)?;
let mut session = model.session();
let result = session.transcribe_file("audio.wav", "en", "en")?;
println!("Transcription: {}", result.text);
// Or transcribe in-memory audio
// let result = session.transcribe_samples(&audio_samples, sample_rate, channels, "en", "en")?;
for token in result.tokens {
// Note: Timestamps are dummy values for now.
println!("Token: {} ({} - {}) (prob: {:.3})", token.text, token.start, token.end, token.prob);
}
// Windowed streaming helper for live audio.
let stream_cfg = StreamConfig::new().with_window_duration(10.0).with_step_duration(2.0);
let mut stream = model.stream("en", "en", stream_cfg)?;
// stream.push_samples(&audio_chunk, sample_rate, channels)?;
ort-defaults (default): Enable ONNX Runtime default features.cuda, tensorrt, coreml, directml, rocm, openvino, webgpu, nnapi.load-dynamic, preload-dylibs (see ort docs).This crate uses the log crate for warnings and diagnostic messages. Configure a logger in
your binary (for example, env_logger or tracing-subscriber) to see output.
Timestamps aren't working right now and are just dummy values because Canary doesn't emit timestamp tokens from the decoder. In the original NeMo implementation, timestamps are generated in a separate post-decode step using forced alignment with an auxiliary CTC model.
MIT License. See LICENSE.md for details.