| Crates.io | voirs |
| lib.rs | voirs |
| version | 0.1.0-alpha.1 |
| created_at | 2025-07-04 11:16:48.344093+00 |
| updated_at | 2025-09-21 05:51:40.893186+00 |
| description | Advanced voice synthesis and speech processing library for Rust |
| homepage | https://github.com/cool-japan/voirs |
| repository | https://github.com/cool-japan/voirs |
| max_upload_size | |
| id | 1737825 |
| size | 1,034,591 |
Democratize state-of-the-art speech synthesis with a fully open, memory-safe, and hardware-portable stack built 100% in Rust.
VoiRS is a cutting-edge Text-to-Speech (TTS) framework that unifies high-performance crates from the cool-japan ecosystem (SciRS2, NumRS2, PandRS, TrustformeRS) into a cohesive neural speech synthesis solution.
🚀 Alpha Release (0.1.0-alpha.1): This is the first public alpha of VoiRS. Core TTS functionality is working and ready for evaluation, but APIs may change and some advanced features are still in development. Perfect for early adopters and researchers!
# Install CLI tool
cargo install voirs-cli
# Or add to your Rust project
cargo add voirs
use voirs::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let pipeline = VoirsPipeline::builder()
.with_voice("en-US-female-calm")
.build()
.await?;
let audio = pipeline
.synthesize("Hello, world! This is VoiRS speaking in pure Rust.")
.await?;
audio.save_wav("output.wav")?;
Ok(())
}
# Basic synthesis
voirs synth "Hello world" output.wav
# With voice selection
voirs synth "Hello world" output.wav --voice en-US-male-energetic
# SSML support
voirs synth '<speak><emphasis level="strong">Hello</emphasis> world!</speak>' output.wav
# Streaming synthesis
voirs synth --stream "Long text content..." output.wav
# List available voices
voirs voices list
VoiRS follows a modular pipeline architecture:
Text Input → G2P → Acoustic Model → Vocoder → Audio Output
↓ ↓ ↓ ↓ ↓
SSML Phonemes Mel Spectrograms Neural WAV/OGG
| Component | Description | Backends |
|---|---|---|
| G2P | Grapheme-to-Phoneme conversion | Phonetisaurus, OpenJTalk, Neural |
| Acoustic | Text → Mel spectrogram | VITS, FastSpeech2 |
| Vocoder | Mel → Waveform | HiFi-GAN, DiffWave |
| Dataset | Training data utilities | LJSpeech, JVS, Custom |
voirs/
├── crates/
│ ├── voirs-g2p/ # Grapheme-to-Phoneme conversion
│ ├── voirs-acoustic/ # Neural acoustic models (VITS)
│ ├── voirs-vocoder/ # Neural vocoders (HiFi-GAN/DiffWave)
│ ├── voirs-dataset/ # Dataset loading and preprocessing
│ ├── voirs-cli/ # Command-line interface
│ ├── voirs-ffi/ # C/Python bindings
│ └── voirs-sdk/ # Unified public API
├── models/ # Pre-trained model zoo
└── examples/ # Usage examples
cargo# Clone repository
git clone https://github.com/cool-japan/voirs.git
cd voirs
# CPU-only build
cargo build --release
# GPU-accelerated build
cargo build --release --features gpu
# WebAssembly build
cargo build --target wasm32-unknown-unknown --release
# All features
cargo build --release --all-features
# Run tests
cargo nextest run --no-fail-fast
# Run benchmarks
cargo bench
# Check code quality
cargo clippy --all-targets --all-features -- -D warnings
cargo fmt --check
| Language | G2P Backend | Status | Quality |
|---|---|---|---|
| English (US) | Phonetisaurus | ✅ Production | MOS 4.5 |
| English (UK) | Phonetisaurus | ✅ Production | MOS 4.4 |
| Japanese | OpenJTalk | ✅ Production | MOS 4.3 |
| Spanish | Neural G2P | 🚧 Beta | MOS 4.1 |
| French | Neural G2P | 🚧 Beta | MOS 4.0 |
| German | Neural G2P | 🚧 Beta | MOS 4.0 |
| Mandarin | Neural G2P | 🚧 Beta | MOS 3.9 |
| Hardware | Backend | RTF | Notes |
|---|---|---|---|
| Intel i7-12700K | CPU | 0.28× | 8-core, 22kHz synthesis |
| Apple M2 Pro | CPU | 0.25× | 12-core, 22kHz synthesis |
| RTX 4080 | CUDA | 0.04× | Batch size 1, 22kHz |
| RTX 4090 | CUDA | 0.03× | Batch size 1, 22kHz |
Explore the examples/ directory for comprehensive usage patterns:
simple_synthesis.rs — Basic text-to-speechbatch_synthesis.rs — Process multiple inputsstreaming_synthesis.rs — Real-time synthesisssml_synthesis.rs — SSML markup supportWe welcome contributions! Please see our Contributing Guide for details.
Licensed under either of:
at your option.
🌐 Website • 📖 Documentation • 💬 Community
Built with ❤️ in Rust by the cool-japan team