canary-rs

Crates.iocanary-rs
lib.rscanary-rs
version0.2.0
created_at2026-01-16 11:55:16.736503+00
updated_at2026-01-16 14:37:32.316031+00
descriptionRust implementation of NVIDIA Canary ASR/AST using ONNX Runtime
homepage
repositoryhttps://github.com/mmende/canary-rs
max_upload_size
id2048482
size127,362
Martin Mende (mmende)

documentation

README

canary-rs

A Rust implementation for NVIDIA's Canary multilingual ASR/AST model using ONNX Runtime.

Usage

Download Canary-1b-v2 or Canary-180m-flash model files from HuggingFace:

  • encoder-model.onnx
  • encoder-model.onnx.data (Canary-1b-v2 only)
  • decoder-model.onnx
  • vocab.txt

or for int8 quantization:

  • encoder-model.int8.onnx
  • decoder-model.int8.onnx
  • vocab.txt

and place them in a directory, e.g., canary-1b-v2.

use canary_rs::{Canary, StreamConfig};

let model = Canary::from_pretrained("canary-1b-v2", None)?;
let mut session = model.session();
let result = session.transcribe_file("audio.wav", "en", "en")?;
println!("Transcription: {}", result.text);

// Or transcribe in-memory audio
// let result = session.transcribe_samples(&audio_samples, sample_rate, channels, "en", "en")?;

for token in result.tokens {
    // Note: Timestamps are dummy values for now.
    println!("Token: {} ({} - {}) (prob: {:.3})", token.text, token.start, token.end, token.prob);
}

// Windowed streaming helper for live audio.
let stream_cfg = StreamConfig::new().with_window_duration(10.0).with_step_duration(2.0);
let mut stream = model.stream("en", "en", stream_cfg)?;
// stream.push_samples(&audio_chunk, sample_rate, channels)?;

Features

  • ort-defaults (default): Enable ONNX Runtime default features.
  • Execution providers (🚧 Mostly untested): cuda, tensorrt, coreml, directml, rocm, openvino, webgpu, nnapi.
  • Dynamic loading: load-dynamic, preload-dylibs (see ort docs).

Logging

This crate uses the log crate for warnings and diagnostic messages. Configure a logger in your binary (for example, env_logger or tracing-subscriber) to see output.

Notes

Timestamps aren't working right now and are just dummy values because Canary doesn't emit timestamp tokens from the decoder. In the original NeMo implementation, timestamps are generated in a separate post-decode step using forced alignment with an auxiliary CTC model.

License

MIT License. See LICENSE.md for details.

Acknowledgments

Commit count: 12

cargo fmt