| Crates.io | text2audio |
| lib.rs | text2audio |
| version | 0.1.1 |
| created_at | 2025-12-28 12:57:56.084996+00 |
| updated_at | 2026-01-02 08:56:11.259812+00 |
| description | A modern Rust crate for converting text to audio with AI-powered segmentation |
| homepage | |
| repository | https://github.com/AnlangA/text2audio |
| max_upload_size | |
| id | 2008791 |
| size | 136,615 |
A high-performance Rust library for converting text to audio files using Zhipu AI's GLM models, featuring intelligent text segmentation, parallel processing, and advanced audio merging capabilities.
Used for intelligent text splitting and semantic analysis:
export ZHIPU_API_KEY="your_api_key_here"
Add to your Cargo.toml:
[dependencies]
text2audio = "0.1.0"
tokio = { version = "1", features = ["full"] }
Basic usage:
use text2audio::Text2Audio;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let api_key = std::env::var("ZHIPU_API_KEY")?;
let converter = Text2Audio::new(api_key);
converter.convert("δ½ ε₯½οΌδΈηοΌ", "output.wav").await?;
println!("Audio saved to output.wav");
Ok(())
}
use text2audio::Text2Audio;
let converter = Text2Audio::new(&api_key);
converter.convert("Hello, world!", "hello.wav").await?;
use text2audio::{Text2Audio, Voice};
let converter = Text2Audio::new(&api_key)
.with_voice(Voice::Xiaochen)
.with_speed(1.5) // 50% faster
.with_volume(2.0); // Louder
converter.convert("ε ιηθ―ι³", "fast.wav").await?;
use text2audio::{Text2Audio, Model};
let long_text = "ιεΈΈιΏηζζ¬...";
let converter = Text2Audio::new(&api_key)
.with_model(Model::GLM4_7) // Use best model for segmentation
.with_max_segment_length(300) // Shorter segments for better flow
.with_thinking(true); // Enable thinking mode
converter.convert(long_text, "long_audio.wav").await?;
use text2audio::{Text2Audio, Voice};
let converter = Text2Audio::new(&api_key)
.with_voice(Voice::Tongtong)
.with_parallel(5) // Process up to 5 segments concurrently
.with_retry_config(5, Duration::from_millis(200));
converter.convert(very_long_text, "output.wav").await?;
use text2audio::{Text2Audio, Model, Voice};
use std::time::Duration;
let converter = Text2Audio::builder(&api_key)
.model(Model::GLM4_7)
.voice(Voice::Tongtong)
.speed(1.2)
.volume(1.5)
.max_segment_length(500)
.parallel(3)
.thinking(true)
.retry_config(3, Duration::from_millis(100))
.build();
converter.convert("δΌεηιΏζζ¬", "narration.wav").await?;
| Method | Type | Range | Default | Description |
|---|---|---|---|---|
with_model() |
Model |
enum | GLM4_5Flash |
AI model for text segmentation |
with_voice() |
Voice |
enum | Tongtong |
Voice selection for TTS |
with_speed() |
f32 |
0.5 - 2.0 | 1.0 |
Speech speed multiplier |
with_volume() |
f32 |
0.0 - 10.0 | 1.0 |
Audio volume level |
with_max_segment_length() |
usize |
100 - 1024 | 500 |
Max characters per segment |
with_parallel() |
usize |
1 - 10 | disabled | Enable concurrent processing |
with_thinking() |
bool |
true/false | false |
Enable AI thinking mode |
with_coding_plan() |
bool |
true/false | false |
Use coding plan endpoint |
with_retry_config() |
(u32, Duration) |
custom | (3, 100ms) |
Retry attempts and delay |
All voices are provided by Zhipu AI's TTS service:
Voice::Tongtong (η«₯η«₯) - Default female voice, clear and naturalVoice::Chuichui (ι€ι€) - Warm and friendly male voiceVoice::Xiaochen (ζθΎ°) - Professional narration voiceVoice::Jam - Youthful and energetic voiceVoice::Kazi - Deep and authoritative voiceVoice::Douji (θ±ιΈ‘) - Cute and playful voiceVoice::Luodo - Mature and calm voiceChoose the appropriate model based on your needs:
The library provides detailed error types for robust error handling:
use text2audio::{Text2Audio, Error};
match converter.convert(text, "output.wav").await {
Ok(_) => println!("β Conversion successful"),
Err(Error::EmptyInput) => eprintln!("β Error: Input text is empty"),
Err(Error::TtsApi(msg)) => eprintln!("β TTS API Error: {}", msg),
Err(Error::AiApi(msg)) => eprintln!("β AI API Error: {}", msg),
Err(Error::Audio(msg)) => eprintln!("β Audio Processing Error: {}", msg),
Err(Error::Io(e)) => eprintln!("β File I/O Error: {}", e),
Err(e) => eprintln!("β Unexpected Error: {}", e),
}
text2audio/
βββ src/
β βββ lib.rs # Main API and Text2Audio struct
β βββ client.rs # Zhipu AI API client
β βββ ai_splitter.rs # AI-powered text segmentation
β βββ audio_merger.rs # WAV audio file merging
β βββ config.rs # Voice and configuration types
β βββ error.rs # Error types and Result alias
βββ examples/ # Usage examples
βββ assets/ # Sample text files
βββ target/ # Build output
The project includes comprehensive examples demonstrating various features:
cargo run --example simple
Converts a short Chinese text to audio using default settings.
cargo run --example ai_splitter
Demonstrates AI-powered semantic segmentation for long texts.
cargo run --example custom_voice
Shows voice customization and parameter tuning.
cargo run --example parallel
Illustrates concurrent audio generation for performance.
cargo run --example from_file
Converts text from a file with optimized settings for long-form content.
cargo run --example ai_splitter
Demonstrates direct usage of the AiSplitter component.
with_parallel(3-5) for long texts to significantly reduce total timeThis project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Areas for improvement:
Please feel free to submit issues, feature requests, or pull requests.