| Crates.io | kitsune-stt |
| lib.rs | kitsune-stt |
| version | 0.1.0 |
| created_at | 2025-10-31 20:35:35.408346+00 |
| updated_at | 2025-10-31 20:35:35.408346+00 |
| description | Speech-to-Text tool using Candle and Voxtral |
| homepage | https://github.com/paazmaya/kitsune-stt |
| repository | |
| max_upload_size | |
| id | 1910729 |
| size | 313,271 |
An Speech-to-Text implementation in Rust of Voxtral speech recognition using candle.

The model used in the conversion is https://huggingface.co/mistralai/Voxtral-Mini-3B-2507
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone the repository
git clone https://github.com/paazmaya/kitsune-stt.git
cd kitsune-stt
There are two features, cuda and cudnn. They require additional libraries each:
GPU (Recommended):
cargo run --all-features --release -- audio.wav
CPU Only:
cargo run --release -- --cpu audio.wav
The following example commands would create a audio.txt in the same folder as the source audio file.
# Transcribe an audio file
cargo run --release -- --input audio.wav
# Force CPU mode
cargo run --release --features cuda -- --cpu --input audio.wav
Run the complete test suite:
# All tests
cargo test
# With all features
cargo test --all-features
# Format code
cargo fmt --all
# Run clippy lints
cargo clippy --all-targets --all-features -- -D warnings
# Security audit
cargo install cargo-audit
cargo audit
See CONTRIBUTING.md for detailed development guidelines.
MIT License - see LICENSE file for details.