| Crates.io | voirs |
| lib.rs | voirs |
| version | 0.1.0-alpha.2 |
| created_at | 2025-07-04 11:16:48.344093+00 |
| updated_at | 2025-10-04 15:08:50.821492+00 |
| description | Advanced voice synthesis and speech processing library for Rust |
| homepage | https://github.com/cool-japan/voirs |
| repository | https://github.com/cool-japan/voirs |
| max_upload_size | |
| id | 1737825 |
| size | 1,102,806 |
Democratize state-of-the-art speech synthesis with a fully open, memory-safe, and hardware-portable stack built 100% in Rust.
VoiRS is a cutting-edge Text-to-Speech (TTS) framework that unifies high-performance crates from the cool-japan ecosystem (SciRS2, NumRS2, PandRS, TrustformeRS) into a cohesive neural speech synthesis solution.
๐ Alpha Release (0.1.0-alpha.2 โ 2025-10-04): Core TTS functionality is working and production-ready. NEW: Complete DiffWave vocoder training pipeline now functional with real parameter saving and gradient-based learning! Perfect for researchers and early adopters who want to train custom vocoders.
# Install CLI tool
cargo install voirs-cli
# Or add to your Rust project
cargo add voirs
use voirs::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let pipeline = VoirsPipeline::builder()
.with_voice("en-US-female-calm")
.build()
.await?;
let audio = pipeline
.synthesize("Hello, world! This is VoiRS speaking in pure Rust.")
.await?;
audio.save_wav("output.wav")?;
Ok(())
}
# Basic synthesis
voirs synth "Hello world" output.wav
# With voice selection
voirs synth "Hello world" output.wav --voice en-US-male-energetic
# SSML support
voirs synth '<speak><emphasis level="strong">Hello</emphasis> world!</speak>' output.wav
# Streaming synthesis
voirs synth --stream "Long text content..." output.wav
# List available voices
voirs voices list
# Train DiffWave vocoder on LJSpeech dataset
voirs train vocoder \
--data /path/to/LJSpeech-1.1 \
--output checkpoints/diffwave \
--model-type diffwave \
--epochs 1000 \
--batch-size 16 \
--lr 0.0002 \
--gpu
# Expected output:
# โ
Real forward pass SUCCESS! Loss: 25.35
# ๐พ Checkpoints saved: 370 parameters, 30MB per file
# ๐ Model: 1,475,136 trainable parameters
# Verify training progress
cat checkpoints/diffwave/best_model.json | jq '{epoch, train_loss, val_loss}'
Training Features:
VoiRS follows a modular pipeline architecture:
Text Input โ G2P โ Acoustic Model โ Vocoder โ Audio Output
โ โ โ โ โ
SSML Phonemes Mel Spectrograms Neural WAV/OGG
| Component | Description | Backends | Training |
|---|---|---|---|
| G2P | Grapheme-to-Phoneme conversion | Phonetisaurus, OpenJTalk, Neural | โ |
| Acoustic | Text โ Mel spectrogram | VITS, FastSpeech2 | ๐ง |
| Vocoder | Mel โ Waveform | HiFi-GAN, DiffWave | โ DiffWave |
| Dataset | Training data utilities | LJSpeech, JVS, Custom | โ |
voirs/
โโโ crates/
โ โโโ voirs-g2p/ # Grapheme-to-Phoneme conversion
โ โโโ voirs-acoustic/ # Neural acoustic models (VITS)
โ โโโ voirs-vocoder/ # Neural vocoders (HiFi-GAN/DiffWave) + Training
โ โโโ voirs-dataset/ # Dataset loading and preprocessing
โ โโโ voirs-cli/ # Command-line interface + Training commands
โ โโโ voirs-ffi/ # C/Python bindings
โ โโโ voirs-sdk/ # Unified public API
โโโ models/ # Pre-trained model zoo
โโโ checkpoints/ # Training checkpoints (SafeTensors)
โโโ examples/ # Usage examples
cargo# Clone repository
git clone https://github.com/cool-japan/voirs.git
cd voirs
# CPU-only build
cargo build --release
# GPU-accelerated build
cargo build --release --features gpu
# WebAssembly build
cargo build --target wasm32-unknown-unknown --release
# All features
cargo build --release --all-features
# Run tests
cargo nextest run --no-fail-fast
# Run benchmarks
cargo bench
# Check code quality
cargo clippy --all-targets --all-features -- -D warnings
cargo fmt --check
# Train a model (NEW in v0.1.0-alpha.2!)
voirs train vocoder --data /path/to/dataset --output checkpoints/my-model --model-type diffwave
# Monitor training
tail -f checkpoints/my-model/training.log
| Language | G2P Backend | Status | Quality |
|---|---|---|---|
| English (US) | Phonetisaurus | โ Production | MOS 4.5 |
| English (UK) | Phonetisaurus | โ Production | MOS 4.4 |
| Japanese | OpenJTalk | โ Production | MOS 4.3 |
| Spanish | Neural G2P | ๐ง Beta | MOS 4.1 |
| French | Neural G2P | ๐ง Beta | MOS 4.0 |
| German | Neural G2P | ๐ง Beta | MOS 4.0 |
| Mandarin | Neural G2P | ๐ง Beta | MOS 3.9 |
| Hardware | Backend | RTF | Notes |
|---|---|---|---|
| Intel i7-12700K | CPU | 0.28ร | 8-core, 22kHz synthesis |
| Apple M2 Pro | CPU | 0.25ร | 12-core, 22kHz synthesis |
| RTX 4080 | CUDA | 0.04ร | Batch size 1, 22kHz |
| RTX 4090 | CUDA | 0.03ร | Batch size 1, 22kHz |
Explore the examples/ directory for comprehensive usage patterns:
simple_synthesis.rs โ Basic text-to-speechbatch_synthesis.rs โ Process multiple inputsstreaming_synthesis.rs โ Real-time synthesisssml_synthesis.rs โ SSML markup supportvoirs train vocoder --data /path/to/LJSpeech-1.1 --output checkpoints/my-voice --model-type diffwave
tail -f checkpoints/my-voice/training.log
cat checkpoints/my-voice/best_model.json | jq '{epoch, train_loss}'
Pure Rust implementation supporting 9 languages with 54 voices!
VoiRS now supports the Kokoro-82M ONNX model for multilingual speech synthesis:
Key Features:
numrs2 for .npz loadingExamples:
kokoro_japanese_demo.rs โ Japanese TTSkokoro_chinese_demo.rs โ Chinese TTS with tone markskokoro_multilingual_demo.rs โ All 9 languageskokoro_espeak_auto_demo.rs โ NEW! Automatic IPA generation with eSpeak NG๐ Full documentation: Kokoro Examples Guide
# Run Japanese demo
cargo run --example kokoro_japanese_demo --features onnx --release
# Run all languages
cargo run --example kokoro_multilingual_demo --features onnx --release
# NEW: Automatic IPA generation (7 languages, no manual phonemes needed!)
cargo run --example kokoro_espeak_auto_demo --features onnx --release
We welcome contributions! Please see our Contributing Guide for details.
Licensed under either of:
at your option.
๐ Website โข ๐ Documentation โข ๐ฌ Community
Built with โค๏ธ in Rust by the cool-japan team