voirs-cli

Crates.io	voirs-cli
lib.rs	voirs-cli
version	0.1.0-alpha.2
created_at	2025-09-21 06:06:23.704104+00
updated_at	2025-10-04 14:36:06.243591+00
description	Command-line interface for VoiRS speech synthesis
homepage	https://github.com/cool-japan/voirs
repository	https://github.com/cool-japan/voirs
max_upload_size
id	1848485
size	2,516,934

KitaSan (cool-japan)

documentation

https://docs.rs/voirs-cli

README

voirs-cli

Command-line interface for VoiRS speech synthesis framework.

A powerful, user-friendly CLI tool for converting text to speech using the VoiRS framework. Features batch processing, real-time synthesis, voice management, and comprehensive output format support.

Features

Text-to-Speech Synthesis: Convert text files or direct input to high-quality audio
SSML Support: Full Speech Synthesis Markup Language processing
Voice Management: Download, list, and manage voices and models
Batch Processing: Process multiple files efficiently with progress tracking
Real-time Synthesis: Interactive mode with live audio playback
Multiple Formats: Output to WAV, FLAC, MP3, Opus, and streaming audio
Quality Control: Configurable quality settings and audio enhancement
Cross-platform: Windows, macOS, and Linux support

Installation

Pre-built Binaries

Download the latest release for your platform from GitHub Releases.

From Source

cargo install voirs-cli

Package Managers

# Homebrew (macOS/Linux)
brew install voirs

# Scoop (Windows)
scoop install voirs

# Chocolatey (Windows)
choco install voirs

Quick Start

# Basic text synthesis
voirs synth "Hello, world!" output.wav

# Use specific voice
voirs synth "Hello, world!" output.wav --voice en-US-female-calm

# SSML synthesis
voirs synth '<speak><emphasis level="strong">Hello</emphasis> world!</speak>' output.wav --ssml

# Interactive mode
voirs interactive

# List available voices
voirs voices list

Commands

`synth` - Text to Speech Synthesis

Convert text to speech audio.

voirs synth [OPTIONS] <TEXT> <OUTPUT>

# Examples
voirs synth "Hello world" hello.wav
voirs synth "Hello world" hello.wav --voice en-US-male-news
voirs synth "Bonjour le monde" bonjour.wav --voice fr-FR-female-casual
voirs synth "Hello world" hello.flac --quality high
voirs synth "Hello world" hello.mp3 --bitrate 320

Options

-v, --voice <VOICE>          Voice to use for synthesis [default: auto]
-q, --quality <QUALITY>      Synthesis quality [low|medium|high|ultra] [default: high]
-r, --sample-rate <RATE>     Output sample rate [default: 22050]
-f, --format <FORMAT>        Output format [wav|flac|mp3|opus] [default: auto]
-s, --ssml                   Input is SSML markup
    --speed <SPEED>          Speaking rate multiplier [default: 1.0]
    --pitch <PITCH>          Pitch shift in semitones [default: 0.0]
    --volume <VOLUME>        Volume adjustment in dB [default: 0.0]
    --enhance                Enable audio enhancement
    --no-normalize           Skip audio normalization
    --gpu                    Use GPU acceleration if available
    --streaming              Enable streaming synthesis for large texts
    --chunk-size <SIZE>      Chunk size for streaming [default: 256]

`batch` - Batch Processing

Process multiple texts or files efficiently.

voirs batch [OPTIONS] <INPUT> <OUTPUT_DIR>

# Examples
voirs batch texts.txt ./audio/
voirs batch sentences.csv ./output/ --format flac
voirs batch book.txt ./chapters/ --split-sentences

Input Formats

# Text file (one sentence per line)
sentences.txt

# CSV file with columns: text,output_name,voice,speed
metadata.csv

# JSON file with array of synthesis requests
requests.json

Options

-f, --format <FORMAT>        Output format for all files
-v, --voice <VOICE>          Default voice for all texts
    --split-sentences        Split long texts into sentences
    --split-paragraphs       Split texts into paragraphs
    --max-length <LENGTH>    Maximum text length per file [default: 1000]
    --parallel <N>           Number of parallel synthesis jobs [default: 4]
    --resume                 Resume interrupted batch processing
    --progress               Show detailed progress information

`interactive` - Interactive Mode

Start an interactive synthesis session.

voirs interactive [OPTIONS]

# Examples
voirs interactive
voirs interactive --voice en-US-female-calm --auto-play

Interactive Commands

> Hello, this is a test.                    # Synthesize text
> :voice en-GB-male-formal                  # Change voice
> :speed 1.2                                # Adjust speaking rate
> :pitch +0.5                               # Adjust pitch
> :quality ultra                            # Change quality
> :save last_synthesis.wav                  # Save last synthesis
> :play                                     # Replay last synthesis
> :ssml <speak><emphasis>Hello</emphasis></speak>  # SSML mode
> :help                                     # Show help
> :quit                                     # Exit

`voices` - Voice Management

Manage available voices and models.

voirs voices <SUBCOMMAND>

# Subcommands
voirs voices list              # List available voices
voirs voices search <QUERY>    # Search for voices
voirs voices info <VOICE>      # Show voice details
voirs voices download <VOICE>  # Download voice model
voirs voices remove <VOICE>    # Remove voice model
voirs voices update            # Update voice database

Examples

# List all voices
voirs voices list

# List voices by language
voirs voices list --language en-US

# Search for female voices
voirs voices search female

# Get voice information
voirs voices info en-US-female-calm

# Download a voice
voirs voices download en-GB-male-formal

# Remove unused voices
voirs voices remove --unused

`models` - Model Management

Manage synthesis models and backends.

voirs models <SUBCOMMAND>

# Subcommands
voirs models list              # List available models
voirs models info <MODEL>      # Show model details  
voirs models download <MODEL>  # Download model
voirs models remove <MODEL>    # Remove model
voirs models benchmark         # Benchmark models
voirs models optimize         # Optimize models for current hardware

Examples

# List installed models
voirs models list

# Download VITS model
voirs models download vits-en-us-female

# Benchmark all models
voirs models benchmark --output benchmark.json

# Optimize for current GPU
voirs models optimize --device cuda:0

`config` - Configuration Management

Manage VoiRS configuration and preferences.

voirs config <SUBCOMMAND>

# Subcommands
voirs config show             # Show current configuration
voirs config set <KEY> <VALUE>  # Set configuration value
voirs config reset            # Reset to defaults
voirs config export <FILE>    # Export configuration
voirs config import <FILE>    # Import configuration

Examples

# Show configuration
voirs config show

# Set default voice
voirs config set default.voice en-US-female-calm

# Set output directory
voirs config set paths.output ~/Downloads/voirs/

# Reset configuration
voirs config reset --confirm

# Export settings
voirs config export my-settings.toml

`server` - HTTP Server Mode

Start VoiRS as an HTTP API server.

voirs server [OPTIONS]

# Examples
voirs server --port 8080
voirs server --host 0.0.0.0 --port 3000 --workers 4

Options

-p, --port <PORT>           Port to listen on [default: 8080]
-h, --host <HOST>           Host to bind to [default: 127.0.0.1]
-w, --workers <N>           Number of worker threads [default: 4]
    --max-text-length <N>   Maximum text length [default: 5000]
    --rate-limit <N>        Requests per minute per IP [default: 60]
    --cors                  Enable CORS headers
    --api-key <KEY>         Require API key authentication

API Endpoints

POST /synthesize              # Synthesize text to audio
GET  /voices                  # List available voices
GET  /voices/{id}             # Get voice information
GET  /health                  # Health check

`benchmark` - Performance Testing

Run performance benchmarks and quality tests.

voirs benchmark [OPTIONS]

# Examples
voirs benchmark --voices en-US-female-calm,en-GB-male-formal
voirs benchmark --output benchmark.json --detailed

Options

-v, --voices <VOICES>       Comma-separated list of voices to test
-o, --output <FILE>         Output results to file
    --detailed              Include detailed metrics
    --quality               Run quality tests (requires reference audio)
    --rtf                   Measure real-time factor
    --memory                Monitor memory usage
    --gpu-usage             Monitor GPU utilization

Configuration

VoiRS uses a hierarchical configuration system with the following precedence:

Command-line arguments
Environment variables
User configuration file (~/.voirs/config.toml)
System configuration file (/etc/voirs/config.toml)
Default values

Configuration File

# ~/.voirs/config.toml

[default]
voice = "en-US-female-calm"
quality = "high"
sample_rate = 22050
format = "wav"

[paths]
models = "~/.voirs/models/"
cache = "~/.voirs/cache/"
output = "~/Downloads/"

[synthesis]
gpu_acceleration = true
streaming = false
chunk_size = 256
enhance_audio = true
normalize_output = true

[voices]
auto_download = true
preferred_languages = ["en-US", "en-GB"]
fallback_voice = "en-US-female-neutral"

[server]
host = "127.0.0.1"
port = 8080
workers = 4
max_text_length = 5000
rate_limit = 60

[batch]
parallel_jobs = 4
progress_reporting = true
resume_enabled = true
auto_split = true

[advanced]
backend = "candle"              # candle, onnx
device = "auto"                 # auto, cpu, cuda:0, metal
precision = "fp32"              # fp16, fp32
memory_limit = "4GB"
log_level = "info"              # error, warn, info, debug, trace

Environment Variables

# Override configuration with environment variables
export VOIRS_DEFAULT_VOICE="en-US-male-news"
export VOIRS_SYNTHESIS_GPU_ACCELERATION="true"
export VOIRS_PATHS_MODELS="/custom/models/path"
export VOIRS_LOG_LEVEL="debug"

Output Formats

WAV (Uncompressed)

voirs synth "Hello" output.wav --sample-rate 44100 --bit-depth 24

FLAC (Lossless Compression)

voirs synth "Hello" output.flac --compression-level 8

MP3 (Lossy Compression)

voirs synth "Hello" output.mp3 --bitrate 320 --quality high

Opus (Modern Codec)

voirs synth "Hello" output.opus --bitrate 128 --application audio

Streaming Audio

# Stream to system audio output
voirs synth "Hello world" --play

# Stream to file while playing
voirs synth "Hello world" output.wav --play --streaming

SSML Support

VoiRS supports Speech Synthesis Markup Language (SSML) for advanced speech control.

Basic SSML

voirs synth '<speak>Hello <emphasis level="strong">world</emphasis>!</speak>' output.wav --ssml

Advanced SSML Examples

<!-- Prosody control -->
<speak>
  <prosody rate="slow" pitch="low" volume="soft">
    This is spoken slowly, in a low pitch, and softly.
  </prosody>
</speak>

<!-- Pauses and breaks -->
<speak>
  Step 1. <break time="1s"/> Step 2. <break time="500ms"/> Step 3.
</speak>

<!-- Phonetic pronunciation -->
<speak>
  You say <phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme>,
  I say <phoneme alphabet="ipa" ph="təˈmɑːtoʊ">tomato</phoneme>.
</speak>

<!-- Voice selection -->
<speak>
  <voice name="en-US-female-calm">This is a calm female voice.</voice>
  <voice name="en-US-male-energetic">This is an energetic male voice!</voice>
</speak>

<!-- Language switching -->
<speak xml:lang="en-US">
  Hello! <span xml:lang="es-ES">¡Hola!</span> 
  <span xml:lang="fr-FR">Bonjour!</span>
</speak>

Batch Processing

Text File Input

# sentences.txt
Hello, this is the first sentence.
This is the second sentence.
And this is the third sentence.

voirs batch sentences.txt ./output/ --voice en-US-female-calm

CSV Input with Metadata

text,output_name,voice,speed,pitch
"Hello world",hello,en-US-female-calm,1.0,0.0
"Bonjour le monde",bonjour,fr-FR-female-casual,1.1,0.5
"Hola mundo",hola,es-ES-male-news,0.9,-0.2

voirs batch metadata.csv ./output/ --format flac

JSON Input with Full Control

[
  {
    "text": "Hello, world!",
    "output": "hello.wav",
    "voice": "en-US-female-calm",
    "quality": "high",
    "ssml": false,
    "effects": {
      "speed": 1.0,
      "pitch": 0.0,
      "volume": 0.0
    }
  },
  {
    "text": "<speak><emphasis>Important</emphasis> announcement!</speak>",
    "output": "announcement.wav", 
    "voice": "en-US-male-formal",
    "quality": "ultra",
    "ssml": true
  }
]

voirs batch requests.json ./output/

Performance Optimization

GPU Acceleration

# Use GPU if available
voirs synth "Hello world" output.wav --gpu

# Specify GPU device
CUDA_VISIBLE_DEVICES=0 voirs synth "Hello world" output.wav --gpu

# Benchmark GPU performance
voirs benchmark --gpu-usage --voices en-US-female-calm

Streaming for Long Texts

# Enable streaming for reduced latency
voirs synth "Very long text..." output.wav --streaming --chunk-size 512

# Interactive streaming
echo "Long text content" | voirs synth - output.wav --streaming

Parallel Batch Processing

# Process with 8 parallel jobs
voirs batch large_dataset.txt ./output/ --parallel 8

# Monitor resource usage
voirs batch large_dataset.txt ./output/ --parallel 4 --progress

Audio Quality Enhancement

Basic Enhancement

voirs synth "Hello world" output.wav --enhance

Advanced Audio Processing

# Custom quality settings
voirs synth "Hello world" output.wav \
  --quality ultra \
  --enhance \
  --volume +3.0 \
  --sample-rate 48000

# Professional audio settings
voirs synth "Hello world" broadcast.wav \
  --quality ultra \
  --enhance \
  --format wav \
  --sample-rate 48000 \
  --bit-depth 24 \
  --no-normalize  # Skip normalization for professional workflow

Troubleshooting

Common Issues

Voice not found:

# List available voices
voirs voices list

# Download missing voice
voirs voices download en-US-female-calm

GPU not working:

# Check GPU support
voirs config show | grep gpu

# Force CPU mode
voirs synth "Hello" output.wav --device cpu

Poor audio quality:

# Try higher quality settings
voirs synth "Hello" output.wav --quality ultra --enhance

# Check sample rate
voirs synth "Hello" output.wav --sample-rate 48000

Memory issues:

# Enable streaming for large texts
voirs synth "$(cat large_text.txt)" output.wav --streaming

# Reduce chunk size
voirs synth "$(cat large_text.txt)" output.wav --streaming --chunk-size 128

Debug Mode

# Enable verbose logging
VOIRS_LOG_LEVEL=debug voirs synth "Hello" output.wav

# Save debug information
voirs synth "Hello" output.wav --debug --debug-output debug.json

Performance Issues

# Profile synthesis performance
voirs benchmark --voices en-US-female-calm --detailed

# Check system resources
voirs benchmark --memory --gpu-usage

# Optimize models for your hardware
voirs models optimize --device auto

Integration Examples

Shell Scripts

#!/bin/bash
# text_to_speech.sh - Convert text files to audio

for file in *.txt; do
    echo "Processing $file..."
    voirs synth "$(cat "$file")" "${file%.txt}.wav" \
        --voice en-US-female-calm \
        --quality high \
        --progress
done

Python Integration

import subprocess
import json

def synthesize_text(text, output_file, voice="en-US-female-calm"):
    """Synthesize text using VoiRS CLI"""
    cmd = [
        "voirs", "synth", text, output_file,
        "--voice", voice,
        "--quality", "high"
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        raise RuntimeError(f"Synthesis failed: {result.stderr}")
    
    return output_file

# Usage
synthesize_text("Hello, world!", "greeting.wav")

Web Integration

// Node.js example using child_process
const { exec } = require('child_process');

function synthesizeText(text, outputFile) {
    return new Promise((resolve, reject) => {
        const cmd = `voirs synth "${text}" "${outputFile}" --quality high`;
        
        exec(cmd, (error, stdout, stderr) => {
            if (error) {
                reject(error);
            } else {
                resolve(outputFile);
            }
        });
    });
}

// Usage
synthesizeText("Hello from Node.js!", "greeting.wav")
    .then(file => console.log(`Audio saved to ${file}`))
    .catch(err => console.error(`Error: ${err.message}`));

Contributing

We welcome contributions! Please see the main repository for contribution guidelines.

Development Setup

git clone https://github.com/cool-japan/voirs.git
cd voirs/crates/voirs-cli

# Install development dependencies
cargo install cargo-nextest

# Run tests
cargo nextest run

# Run CLI locally
cargo run -- synth "Hello world" test.wav

# Build release version
cargo build --release

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT license (LICENSE-MIT)

at your option.

Commit count: 2

voirs-cli

documentation

README

voirs-cli

Features

Installation

Pre-built Binaries

From Source

Package Managers

Quick Start

Commands

synth - Text to Speech Synthesis

Options

batch - Batch Processing

Input Formats

Options

interactive - Interactive Mode

Interactive Commands

voices - Voice Management

Examples

models - Model Management

Examples

config - Configuration Management

Examples

server - HTTP Server Mode

Options

API Endpoints

benchmark - Performance Testing

Options

Configuration

Configuration File

Environment Variables

Output Formats

WAV (Uncompressed)

FLAC (Lossless Compression)

MP3 (Lossy Compression)

Opus (Modern Codec)

Streaming Audio

SSML Support

Basic SSML

Advanced SSML Examples

Batch Processing

Text File Input

CSV Input with Metadata

JSON Input with Full Control

Performance Optimization

GPU Acceleration

Streaming for Long Texts

Parallel Batch Processing

Audio Quality Enhancement

Basic Enhancement

Advanced Audio Processing

Troubleshooting

Common Issues

Debug Mode

Performance Issues

Integration Examples

Shell Scripts

Python Integration

Web Integration

Contributing

Development Setup

License

cargo fmt

`synth` - Text to Speech Synthesis

`batch` - Batch Processing

`interactive` - Interactive Mode

`voices` - Voice Management

`models` - Model Management

`config` - Configuration Management

`server` - HTTP Server Mode

`benchmark` - Performance Testing