silero_vad_burn

Crates.io	silero_vad_burn
lib.rs	silero_vad_burn
version	0.1.1
created_at	2026-01-19 21:05:40.406403+00
updated_at	2026-01-20 15:32:09.719916+00
description	A Silero_vad inference lib in Rust base on Burn framework
homepage
repository	https://github.com/second-state/silero_vad_burn
max_upload_size
id	2055353
size	5,524,025

zzz (L-jasmine)

documentation

README

Silero-VAD V6 in Rust (based on Burn)

This is a Rust implementation of Silero-VAD V6. Silero-VAD is a Voice Activity Detection (VAD) system that can detect speech segments in audio files, separating speech from silence and noise.

What is VAD?

Voice Activity Detection (VAD) is used to:

Detect speech segments in audio recordings
Generate timestamps for when speech starts and ends
Remove silence from audio for further processing
Preprocess audio for speech recognition systems
Analyze conversations and meetings

Requirements

Rust: 1.90.0 or later
LibTorch: 1.13.0 or later
System: GCC 11.4.0+ (Linux), MSVC 2019+ (Windows), or Xcode Command Line Tools (macOS)

Why Burn?

Silero VAD is extremely compact—the ONNX model is only 2MB. However, dependencies like ONNX Runtime and LibTorch are massive, often requiring hundreds of megabytes or even gigabytes of disk space. Installing such heavy libraries just to run a tiny 2MB model is clearly overkill.

Burn supports a wide variety of backends, and for Silero VAD, the ndarray backend is more than sufficient. With release optimizations, the entire binary—including model weights—weighs in at under 10MB and is compatible with virtually any machine.

Usage

    let device = burn_ndarray::NdArrayDevice::default();
    let model: SileroVAD6Model<burn_ndarray::NdArray<f32>> = SileroVAD6Model::new(&device).unwrap();
    let mut predict_state = PredictState::default(&device);

    let file_in = std::fs::File::open("test_wav/hello_beep.wav").unwrap();
    let (header, wav_data) = wav_io::read_from_file(file_in).unwrap();
    print!("wav header: {:?}\n", header);

    let chunk_size = predict_state.input_size();

    for chunk in wav_data.chunks(chunk_size) {
        // println!("chunk {:?}", chunk);
        if chunk.len() != chunk_size {
            break;
        }
        let input = Tensor::<burn_ndarray::NdArray<f32>, 1>::from_data(chunk, &device).unsqueeze();

        let (new_state, output) = model.predict(predict_state, input).unwrap();
        predict_state = new_state;
        println!("output: {:?}", output);
    }

License

This project follows the same license as the original Silero-VAD project.

Commit count: 7