| Crates.io | voirs-g2p |
| lib.rs | voirs-g2p |
| version | 0.1.0-alpha.1 |
| created_at | 2025-09-21 03:36:48.282128+00 |
| updated_at | 2025-09-21 03:36:48.282128+00 |
| description | Grapheme-to-Phoneme conversion for VoiRS speech synthesis |
| homepage | https://github.com/cool-japan/voirs |
| repository | https://github.com/cool-japan/voirs |
| max_upload_size | |
| id | 1848392 |
| size | 1,577,518 |
Grapheme-to-Phoneme (G2P) conversion for VoiRS speech synthesis framework.
This crate provides high-quality text-to-phoneme conversion with support for multiple languages and backends. It serves as the first stage in the VoiRS speech synthesis pipeline, converting input text into phonetic representations that can be processed by acoustic models.
use voirs_g2p::{G2p, PhoneticusG2p, Phoneme};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize English G2P with Phonetisaurus backend
let g2p = PhoneticusG2p::new("en-US").await?;
// Convert text to phonemes
let phonemes: Vec<Phoneme> = g2p.to_phonemes("Hello world!", None).await?;
// Print phonetic representation
for phoneme in phonemes {
println!("{}", phoneme.symbol());
}
Ok(())
}
| Language | Backend | Accuracy | Status |
|---|---|---|---|
| English (US) | Phonetisaurus | 95.2% | ✅ Stable |
| English (UK) | Phonetisaurus | 94.8% | ✅ Stable |
| Japanese | OpenJTalk | 92.1% | ✅ Stable |
| Spanish | Neural G2P | 89.3% | 🚧 Beta |
| French | Neural G2P | 88.7% | 🚧 Beta |
| German | Neural G2P | 88.1% | 🚧 Beta |
| Mandarin | Neural G2P | 85.9% | 🚧 Beta |
Text Input → Preprocessing → Language Detection → Backend Selection → Phonemes
↓ ↓ ↓ ↓ ↓
"Hello" "hello" "en-US" Phonetisaurus [HH, AH, L, OW]
Text Preprocessing
Language Detection
Backend Routing
Phoneme Generation
#[async_trait]
pub trait G2p: Send + Sync {
/// Convert text to phonemes for given language
async fn to_phonemes(&self, text: &str, lang: Option<&str>) -> Result<Vec<Phoneme>>;
/// Get list of supported language codes
fn supported_languages(&self) -> Vec<LanguageCode>;
/// Get backend metadata and capabilities
fn metadata(&self) -> G2pMetadata;
/// Preprocess text before phoneme conversion
async fn preprocess(&self, text: &str, lang: Option<&str>) -> Result<String>;
/// Detect language of input text
async fn detect_language(&self, text: &str) -> Result<LanguageCode>;
}
#[derive(Debug, Clone, PartialEq)]
pub struct Phoneme {
/// IPA symbol (e.g., "æ", "t̪", "d͡ʒ")
pub symbol: String,
/// Stress level (0=none, 1=primary, 2=secondary)
pub stress: u8,
/// Position within syllable
pub syllable_position: SyllablePosition,
/// Predicted duration in milliseconds
pub duration_ms: Option<f32>,
/// Confidence score (0.0-1.0)
pub confidence: f32,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum LanguageCode {
EnUs, // English (US)
EnGb, // English (UK)
JaJp, // Japanese
EsEs, // Spanish (Spain)
EsMx, // Spanish (Mexico)
FrFr, // French (France)
DeDE, // German (Germany)
ZhCn, // Chinese (Simplified)
// ... more languages
}
use voirs_g2p::{PhoneticusG2p, G2p};
let g2p = PhoneticusG2p::new("en-US").await?;
let phonemes = g2p.to_phonemes("The quick brown fox.", None).await?;
// Convert to IPA string
let ipa: String = phonemes.iter()
.map(|p| p.symbol.as_str())
.collect::<Vec<_>>()
.join(" ");
println!("IPA: {}", ipa);
use voirs_g2p::{MultilingualG2p, G2p};
let g2p = MultilingualG2p::builder()
.add_backend("en", PhoneticusG2p::new("en-US").await?)
.add_backend("ja", OpenJTalkG2p::new().await?)
.build();
// Automatic language detection
let text = "Hello world! こんにちは世界!";
let phonemes = g2p.to_phonemes(text, None).await?;
use voirs_g2p::{SsmlG2p, G2p};
let g2p = SsmlG2p::new(PhoneticusG2p::new("en-US").await?);
let ssml = r#"
<speak>
<phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme>
versus
<phoneme alphabet="ipa" ph="təˈmɑːtoʊ">tomato</phoneme>
</speak>
"#;
let phonemes = g2p.to_phonemes(ssml, Some("en-US")).await?;
use voirs_g2p::{BatchG2p, G2p};
let g2p = PhoneticusG2p::new("en-US").await?;
let batch_g2p = BatchG2p::new(g2p, 32); // batch size of 32
let texts = vec![
"First sentence.",
"Second sentence.",
"Third sentence.",
];
let results = batch_g2p.to_phonemes_batch(&texts, None).await?;
use voirs_g2p::{G2p, TextPreprocessor};
let mut preprocessor = TextPreprocessor::new("en-US");
preprocessor.add_rule(r"\$(\d+)", |caps| {
format!("{} dollars", caps[1].parse::<i32>().unwrap())
});
let g2p = PhoneticusG2p::with_preprocessor("en-US", preprocessor).await?;
let phonemes = g2p.to_phonemes("It costs $5.99", None).await?;
| Backend | Latency (1 sentence) | Throughput (batch) | Memory Usage |
|---|---|---|---|
| Phonetisaurus | 0.3ms | 2,500 sent/s | 50MB |
| OpenJTalk | 0.8ms | 1,200 sent/s | 100MB |
| Neural G2P | 2.1ms | 800 sent/s | 20MB |
Add to your Cargo.toml:
[dependencies]
voirs-g2p = "0.1"
# Optional backends
[dependencies.voirs-g2p]
version = "0.1"
features = ["phonetisaurus", "openjtalk", "neural"]
phonetisaurus: Enable Phonetisaurus FST backendopenjtalk: Enable OpenJTalk Japanese backendneural: Enable neural LSTM backendall-backends: Enable all available backendscli: Enable command-line binaryPhonetisaurus backend:
# Ubuntu/Debian
sudo apt-get install libfst-dev
# macOS
brew install openfst
OpenJTalk backend:
# Ubuntu/Debian
sudo apt-get install libopenjtalk-dev
# macOS
brew install open-jtalk
Create ~/.voirs/g2p.toml:
[default]
language = "en-US"
backend = "phonetisaurus"
[preprocessing]
expand_numbers = true
expand_abbreviations = true
normalize_unicode = true
[phonetisaurus]
model_path = "~/.voirs/models/g2p/"
cache_size = 10000
[openjtalk]
dictionary_path = "/usr/share/open-jtalk/dic"
voice_path = "/usr/share/open-jtalk/voice"
[neural]
model_path = "~/.voirs/models/neural-g2p/"
device = "cpu" # or "cuda:0"
use voirs_g2p::{G2pError, ErrorKind};
match g2p.to_phonemes("text", None).await {
Ok(phonemes) => println!("Success: {} phonemes", phonemes.len()),
Err(G2pError { kind, context, .. }) => match kind {
ErrorKind::UnsupportedLanguage => {
eprintln!("Language not supported: {}", context);
}
ErrorKind::ModelNotFound => {
eprintln!("Model files missing: {}", context);
}
ErrorKind::ParseError => {
eprintln!("Failed to parse input: {}", context);
}
_ => eprintln!("Other error: {}", context),
}
}
We welcome contributions! Please see the main repository for contribution guidelines.
git clone https://github.com/cool-japan/voirs.git
cd voirs/crates/voirs-g2p
# Install development dependencies
cargo install cargo-nextest
# Run tests
cargo nextest run
# Run benchmarks
cargo bench
# Check code quality
cargo clippy -- -D warnings
cargo fmt --check
G2p trait for your languageLanguageCode enumLicensed under either of:
at your option.