| Crates.io | silero-vad-rust |
| lib.rs | silero-vad-rust |
| version | 6.2.1 |
| created_at | 2025-11-09 10:43:09.045321+00 |
| updated_at | 2025-11-09 15:04:51.13053+00 |
| description | Rust port of the Silero Voice Activity Detector (VAD) running the bundled ONNX models |
| homepage | https://github.com/sheldonix/silero-vad-rust |
| repository | https://github.com/sheldonix/silero-vad-rust |
| max_upload_size | |
| id | 1923975 |
| size | 6,900,900 |
Rust port of the Silero Voice Activity Detector that runs the pre-trained ONNX models through the safe ort bindings. The crate bundles the original ONNX weights and exposes idiomatic Rust helpers for loading audio, running the network, and post-processing speech segments.
opset 15 & 16) inside src/silero_vad/data, so no extra downloads are required.LoadOptions.forward_chunk keeps internal state for long-running streams, while audio_forward processes entire buffers offline.VadParameters lets you tune thresholds, speech/silence windows, and return units (samples or seconds) for rapid adaptation to new setups.read_audio, save_audio, get_speech_timestamps, collect_chunks, drop_chunks, and VadIterator for quick end-to-end adoption.cargo add silero-vad-rust
The crate expects the ONNX Runtime shared library to be discoverable at runtime (ort with load-dynamic). See the ort crate docs if you need to pin a specific runtime build.
use silero_vad_rust::{get_speech_timestamps, load_silero_vad, read_audio};
use silero_vad_rust::silero_vad::utils_vad::VadParameters;
fn main() -> anyhow::Result<()> {
let audio = read_audio("samples/test.wav", 16_000)?;
let mut model = load_silero_vad()?; // defaults to ONNX opset 16
let params = VadParameters {
return_seconds: true,
..Default::default()
};
let speech = get_speech_timestamps(&audio, &mut model, ¶ms)?;
println!("Detected segments: {speech:?}");
Ok(())
}
use silero_vad_rust::{load_silero_vad, read_audio};
fn stream_chunks() -> anyhow::Result<()> {
let audio = read_audio("samples/long.wav", 16_000)?;
let mut model = load_silero_vad()?;
let chunk_size = 512; // 16 kHz window
for frame in audio.chunks(chunk_size) {
let padded = if frame.len() == chunk_size {
frame.to_vec()
} else {
let mut tmp = vec![0.0f32; chunk_size];
tmp[..frame.len()].copy_from_slice(frame);
tmp
};
let probs = model.forward_chunk(&padded, 16_000)?;
println!("frame prob={:.3}", probs[[0, 0]]);
}
Ok(())
}
use silero_vad_rust::{
collect_chunks, drop_chunks, get_speech_timestamps, load_silero_vad, read_audio, save_audio,
};
use silero_vad_rust::silero_vad::utils_vad::VadParameters;
fn trim_audio() -> anyhow::Result<()> {
let audio = read_audio("samples/raw.wav", 16_000)?;
let mut model = load_silero_vad()?;
let params = VadParameters {
return_seconds: false,
..Default::default()
};
let speech = get_speech_timestamps(&audio, &mut model, ¶ms)?;
let voice_only = collect_chunks(&speech, &audio, false, None)?;
save_audio("out_voice.wav", &voice_only, 16_000)?;
let muted_voice = drop_chunks(&speech, &audio, false, None)?;
save_audio("out_silence.wav", &muted_voice, 16_000)?;
Ok(())
}
use silero_vad_rust::{
load_silero_vad, read_audio,
silero_vad::utils_vad::{VadEvent, VadIterator, VadIteratorParams},
};
fn iterate_events() -> anyhow::Result<()> {
let audio = read_audio("samples/live.wav", 16_000)?;
let model = load_silero_vad()?;
let params = VadIteratorParams {
threshold: 0.55,
..Default::default()
};
let mut iterator = VadIterator::new(model, params)?;
for frame in audio.chunks(512) {
let event = iterator.process_chunk(frame, true, 1)?;
if let Some(VadEvent::Start(ts)) = event {
println!("speech started at {ts}s");
} else if let Some(VadEvent::End(ts)) = event {
println!("speech ended at {ts}s");
}
}
Ok(())
}
use silero_vad_rust::silero_vad::model::{load_silero_vad_with_options, LoadOptions};
fn load_gpu_model() -> anyhow::Result<()> {
let options = LoadOptions {
opset_version: 15,
force_onnx_cpu: false, // allow custom providers (GPU, NNAPI, etc.)
..Default::default()
};
let _model = load_silero_vad_with_options(options)?;
Ok(())
}
To actually run on GPU, enable the matching
ortfeature in your ownCargo.toml(for exampleort = { version = "2.0.0-rc.10", features = ["load-dynamic", "cuda"] }) and pointORT_DYLIB_PATH(or your system library path) to a GPU-enabled ONNX Runtime binary. Withforce_onnx_cpu = false, the runtime will use whatever providers were compiled into that library; if you need to prioritize a specific provider (e.g. CUDAExecutionProvider), extendOnnxModel::from_pathto register it explicitly.
use silero_vad_rust::{get_speech_timestamps, load_silero_vad, read_audio};
use silero_vad_rust::silero_vad::utils_vad::VadParameters;
fn compare_thresholds() -> anyhow::Result<()> {
let audio = read_audio("samples/noisy.wav", 16_000)?;
let mut model = load_silero_vad()?;
let mut strict = VadParameters::default();
strict.threshold = 0.65;
strict.min_speech_duration_ms = 400;
let mut permissive = VadParameters::default();
permissive.threshold = 0.4;
permissive.min_speech_duration_ms = 150;
let strict_segments = get_speech_timestamps(&audio, &mut model, &strict)?;
model.reset_states();
let permissive_segments = get_speech_timestamps(&audio, &mut model, &permissive)?;
println!("strict count: {}", strict_segments.len());
println!("permissive count: {}", permissive_segments.len());
Ok(())
}
cargo fmt
cargo clippy --all-targets
cargo test
Integration tests use WAV fixtures in tests/data.
load_silero_vad_with_options lets you pick opset_version 15/16 and toggle force_onnx_cpu.use_onnx = true.cargo build --no-default-features.Distributed under the MIT License, matching the upstream Silero VAD project. See LICENSE for details.