voice-engine

Crates.iovoice-engine
lib.rsvoice-engine
version0.3.31
created_at2025-11-26 08:42:51.144071+00
updated_at2025-12-31 08:55:44.808569+00
descriptionA voip voice processing engine in Rust
homepage
repositoryhttps://github.com/restsend/voice-engine
max_upload_size
id1951129
size5,825,127
jinti (shenjinti)

documentation

README

Voice Engine (Deprecated)

Note: This project has been merged into active-call. This repository is no longer maintained.

A robust media processing library for Rust, designed for building voice applications.

Features

  • Codecs: Support for G.711 (PCMA/PCMU), G.722, G.729, and Opus.
  • Media Processing: Includes Jitter Buffer, Resampling, and DTMF handling.
  • Voice Activity Detection (VAD): Integrated Silero VAD and Ten VAD. Both are pure Rust implementations without onnxruntime dependency, offering superior performance.
  • Noise Reduction: Built-in denoiser.
  • Speech Services:
    • ASR: Aliyun, Tencent.
    • TTS: Aliyun, Deepgram, Tencent.
  • Transport: RTP and WebRTC support.

Usage

This library is intended to be used as a component in voice applications like rustpbx.

Performance

We have optimized the VAD engines to achieve state-of-the-art performance in pure Rust, significantly outperforming ONNX Runtime.

Benchmark Environment: macOS, Apple Silicon (Single Core), 60s 16kHz Audio

VAD Engine Implementation Time (60s) FPS RTF (Real Time Factor) Note
TinyTen Rust (Hand-opt) ~31.5 ms ~59,600 ~1900x Extremely lightweight
TinySilero Rust (Optimized) ~76.4 ms ~24,500 ~785x >2x faster than ONNX
ONNX Ten ONNX Runtime ~99.6 ms ~37,600 ~760x
ONNX Silero ONNX Runtime ~158.3 ms ~11,800 ~380x Standard baseline
WebRTC VAD C/C++ (Bind) ~3.1 ms ~960,000 ~20,000x Legacy, less accurate

Note: The Rust implementations are fully specialized for the specific model architectures, using techniques like loop unrolling and unsafe pointer arithmetic to eliminate overhead.

Commit count: 0

cargo fmt