| Crates.io | vtt-rs |
| lib.rs | vtt-rs |
| version | 0.1.3 |
| created_at | 2025-11-17 14:09:50.905078+00 |
| updated_at | 2025-11-19 23:32:06.861074+00 |
| description | Library and CLI for streaming microphone input to OpenAI compatible transcription APIs |
| homepage | |
| repository | https://github.com/geoffsee/vtt-rs |
| max_upload_size | |
| id | 1936840 |
| size | 195,340 |
A Rust library and command-line utility for real-time audio transcription using OpenAI-compatible APIs. Perfect for adding situational awareness to AI agents through speech recognition.
Or build locally:
cargo doc --no-deps --open
If your endpoint requires authentication, set OPENAI_API_KEY in your environment. For local OpenAI-compatible servers that don't require auth, this can be omitted.
The binary expects an optional JSON configuration file (default vtt.config.json in the current directory, or pass an alternate path as the first argument).
Supported keys (all optional; sensible defaults exist):
{
"chunk_duration_secs": 5,
"model": "whisper-1",
"endpoint": "https://api.openai.com/v1/audio/transcriptions",
"out_file": "transcripts.log",
"on_device": {
"enabled": false,
"model": "tiny.en",
"cpu": true
}
}
chunk_duration_secs: duration of each captured audio block that is transcribed.model: which OpenAI transcription model to hit.endpoint: custom transcription endpoint for e.g. a proxy service.out_file: path to append every transcription (chunk ID + contents).on_device: optional block to turn on the bundled Candle Whisper runner.Set on_device.enabled to true in your config to run Whisper locally without
calling the OpenAI API. You can pick from the built-in model shortcuts
("tiny", "small", etc.), force CPU execution, and optionally select a
specific input device.
You can use a local OpenAI-compatible server that serves the MLX model mlx-community/parakeet-tdt-0.6b-v2. Point endpoint to your server and set model accordingly. No OPENAI_API_KEY is required when the server does not enforce auth.
Example config snippet:
{
"chunk_duration_secs": 3,
"model": "mlx-community/parakeet-tdt-0.6b-v2",
"endpoint": "http://localhost:8000/v1/audio/transcriptions",
"out_file": "transcripts.log"
}
Then run the CLI without setting OPENAI_API_KEY:
cargo run -- vtt.config.json
Notes:
model identifier.Add vtt-rs to your Cargo.toml:
[dependencies]
vtt-rs = { git = "https://github.com/geoffsee/vtt-rs" }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
use vtt_rs::{Config, TranscriptionEvent, TranscriptionService};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let config = Config::default();
let api_key = std::env::var("OPENAI_API_KEY")?;
let mut service = TranscriptionService::new(config, api_key)?;
let (mut receiver, _stream) = service.start().await?;
// Process transcription events
while let Some(event) = receiver.recv().await {
match event {
TranscriptionEvent::Transcription { chunk_id, text } => {
println!("Heard: {}", text);
// Feed this to your AI agent for situational awareness
}
TranscriptionEvent::Error { chunk_id, error } => {
eprintln!("Error: {}", error);
}
}
}
Ok(())
}
The library is designed to give AI agents "ears" - the ability to perceive and respond to their audio environment. Check out the examples:
examples/ai_agent.rs - Basic AI agent with audio awarenessexamples/streaming_agent.rs - Advanced agent with temporal contextRun examples with:
OPENAI_API_KEY=sk-... cargo run --example ai_agent
# With OpenAI or any endpoint requiring auth
OPENAI_API_KEY=sk-... cargo run -- vtt.config.json
# With a local server that does not require auth
cargo run -- vtt.config.json
vtt.config.json from the current directory if it exists, otherwise it runs with defaults.out_file is set, appended to that file in addition to the console output.