| Crates.io | kokoroxide |
| lib.rs | kokoroxide |
| version | 0.1.5 |
| created_at | 2025-10-03 09:15:23.940621+00 |
| updated_at | 2025-10-04 10:28:39.81734+00 |
| description | A Rust implementation of Kokoro TTS (Text-to-Speech) synthesis |
| homepage | |
| repository | https://github.com/dhruv304c2/kokoroxide |
| max_upload_size | |
| id | 1866408 |
| size | 129,988 |
A high-performance Rust implementation of Kokoro TTS (Text-to-Speech) synthesis, leveraging ONNX Runtime for efficient neural speech generation. Uses espeak-ng for text-to-phoneme conversion, with built-in conversion logic into Misaki phoneme notation expected by Kokoro models. Distributed under a dual MIT/Apache-2.0 license to match the broader Rust ecosystem.
Note: Currently only supports and has been tested with American English. Contributions for different languages are very welcome!
Add this to your Cargo.toml:
[dependencies]
kokoroxide = "0.1.3"
use kokoroxide::{load_voice_style, KokoroTTS, TTSConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Configure the ONNX model + tokenizer that Kokoro requires.
// These files live outside the crate; download them from Kokoro's distribution (https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX).
let config = TTSConfig::new("path/to/kokoro.onnx", "path/to/tokenizer.json")
.with_sample_rate(24000)
.with_max_tokens_length(512)
.with_graph_optimization_level(kokoroxide::GraphOptimizationLevel::Disable);
// Build the speech engine with the explicit configuration so advanced knobs are available.
let tts_service = KokoroTTS::with_config(config)?;
// Load a voice style vector (.bin) that controls prosody and speaker identity.
let voice = load_voice_style("path/to/voice.bin")?;
// Generate speech at 1.0x speed for the requested text.
let text = "Hello, this is a text-to-speech synthesis example.";
let audio = tts_service.generate_speech(text, &voice, 1.0)?;
// Persist the synthesized waveform to a WAV file for playback.
audio.save_to_wav("path/to/output.wav")?;
Ok(())
}
For a complete runnable example pointing at real assets, see the kokoroxide-demo sample project in this workspace (kokoroxide-demo/src/main.rs).
KokoroTTSThe main TTS engine that handles text-to-speech conversion.
// Create with default config
let tts = KokoroTTS::new(model_path, tokenizer_path)?;
// Create with custom config
let config = TTSConfig::new(model_path, tokenizer_path)
.with_max_tokens_length(128)
.with_sample_rate(24000);
let tts = KokoroTTS::with_config(config)?;
VoiceStyleRepresents voice characteristics as a style vector. Voice files contain multiple style vectors indexed by token length.
// Load from binary file
let voice = load_voice_style("voice.bin")?;
// Create custom voice with vector size
let custom_voice = VoiceStyle::new(vec![0.1, 0.2, ...], 256);
GeneratedAudioContains the generated audio samples and metadata.
let audio = tts.speak("Hello!", &voice)?;
println!("Duration: {} seconds", audio.duration_seconds);
println!("Sample rate: {} Hz", audio.sample_rate);
audio.save_to_wav("output.wav")?;
let audio = tts.speak("Hello, world!", &voice)?;
let audio = tts.generate_speech("Speak faster!", &voice, 1.5)?; // 1.5x speed
let audio = tts.generate_speech_from_phonemes("həˈloʊ wɜːld", &voice, 1.0)?;
let tokens = vec![101, 2234, 1567, 102]; // Pre-tokenized input
let audio = tts.generate_from_tokens(&tokens, &voice, 1.0)?;
use ort::{execution_providers::CoreMLExecutionProviderOptions, ExecutionProvider, GraphOptimizationLevel};
let config = TTSConfig::new(model_path, tokenizer_path)
.with_max_tokens_length(512) // Maximum token sequence length
.with_sample_rate(24000) // Audio sample rate in Hz
.with_graph_optimization_level(GraphOptimizationLevel::Level3)
.with_execution_providers(vec![
ExecutionProvider::CoreML(CoreMLExecutionProviderOptions::default()),
]); // Optional hardware acceleration
If you don't need custom providers, you can skip the call to with_execution_providers and the default CPU provider will be used.
The with_graph_optimization_level() method allows you to control ONNX Runtime's graph optimization:
GraphOptimizationLevel::Disable - No optimizationsGraphOptimizationLevel::Level1 - Basic optimizationsGraphOptimizationLevel::Level2 - Extended optimizationsGraphOptimizationLevel::Level3 - Maximum optimizations (default)Rust 1.70+
espeak-ng (required for text-to-phoneme conversion):
sudo apt-get install espeak-ng libespeak-ng-devbrew install espeak-ngsudo pacman -S espeak-ngONNX Runtime (automatically downloaded via ort crate)
Kokoro model files:
kokoro-v0_19.onnx)tokenizer.json).bin format)The crate automatically links to espeak-ng based on your platform:
/opt/homebrew/lib (Homebrew default)If espeak-ng is installed in a non-standard location, you may need to set:
export LD_LIBRARY_PATH=/path/to/espeak-ng/lib:$LD_LIBRARY_PATH # Linux
export DYLD_LIBRARY_PATH=/path/to/espeak-ng/lib:$DYLD_LIBRARY_PATH # macOS
DEBUG_PHONEMES - Enable phoneme debugging output:
DEBUG_PHONEMES=1 cargo run
This will print:
DEBUG_TOKENS - Enable token debugging output:
DEBUG_TOKENS=1 cargo run
This will print:
DEBUG_TIMING - Enable performance timing logs:
DEBUG_TIMING=1 cargo run
This will print:
All debug modes:
DEBUG_PHONEMES=1 DEBUG_TOKENS=1 DEBUG_TIMING=1 cargo run
Download the Kokoro model files from the official repository:
*.onnx - The model filetokenizer.json - Tokenizer configuration*.bin) - Style vectors for different voicesuse kokoroxide::{KokoroTTS, load_voice_style};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let tts = KokoroTTS::new("model.onnx", "tokenizer.json")?;
let voice = load_voice_style("voice.bin")?;
let text = "Welcome to kokoroxide TTS!";
let audio = tts.generate_speech(text, &voice, 1.0)?;
audio.save_to_wav("welcome.wav")?;
println!("Generated {} seconds of audio", audio.duration_seconds);
Ok(())
}
Licensed under either of:
at your option.
Contributions are welcome! Please feel free to submit a Pull Request.
This project implements the Kokoro TTS model in Rust, providing a high-performance alternative to Python implementations.