vtt-rs

Crates.iovtt-rs
lib.rsvtt-rs
version0.1.3
created_at2025-11-17 14:09:50.905078+00
updated_at2025-11-19 23:32:06.861074+00
descriptionLibrary and CLI for streaming microphone input to OpenAI compatible transcription APIs
homepage
repositoryhttps://github.com/geoffsee/vtt-rs
max_upload_size
id1936840
size195,340
Geoff Seemueller (geoffsee)

documentation

README

vtt-rs

A Rust library and command-line utility for real-time audio transcription using OpenAI-compatible APIs. Perfect for adding situational awareness to AI agents through speech recognition.

CI Documentation API Docs

Documentation

Or build locally:

cargo doc --no-deps --open

Configuration

  • If your endpoint requires authentication, set OPENAI_API_KEY in your environment. For local OpenAI-compatible servers that don't require auth, this can be omitted.

  • The binary expects an optional JSON configuration file (default vtt.config.json in the current directory, or pass an alternate path as the first argument).

  • Supported keys (all optional; sensible defaults exist):

    {
      "chunk_duration_secs": 5,
      "model": "whisper-1",
      "endpoint": "https://api.openai.com/v1/audio/transcriptions",
      "out_file": "transcripts.log",
      "on_device": {
        "enabled": false,
        "model": "tiny.en",
        "cpu": true
      }
    }
    
    • chunk_duration_secs: duration of each captured audio block that is transcribed.
    • model: which OpenAI transcription model to hit.
    • endpoint: custom transcription endpoint for e.g. a proxy service.
    • out_file: path to append every transcription (chunk ID + contents).
    • on_device: optional block to turn on the bundled Candle Whisper runner.

On-Device Whisper

Set on_device.enabled to true in your config to run Whisper locally without calling the OpenAI API. You can pick from the built-in model shortcuts ("tiny", "small", etc.), force CPU execution, and optionally select a specific input device.

Local MLX Parakeet (no API key)

You can use a local OpenAI-compatible server that serves the MLX model mlx-community/parakeet-tdt-0.6b-v2. Point endpoint to your server and set model accordingly. No OPENAI_API_KEY is required when the server does not enforce auth.

Example config snippet:

{
  "chunk_duration_secs": 3,
  "model": "mlx-community/parakeet-tdt-0.6b-v2",
  "endpoint": "http://localhost:8000/v1/audio/transcriptions",
  "out_file": "transcripts.log"
}

Then run the CLI without setting OPENAI_API_KEY:

cargo run -- vtt.config.json

Notes:

  • Ensure your local server implements an OpenAI-compatible audio transcription endpoint and understands the model identifier.
  • On-device mode in this repo currently supports Whisper via Candle. Parakeet support is provided via the remote endpoint path as shown above.

Usage as a Library

Add vtt-rs to your Cargo.toml:

[dependencies]
vtt-rs = { git = "https://github.com/geoffsee/vtt-rs" }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }

Basic Example

use vtt_rs::{Config, TranscriptionEvent, TranscriptionService};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = Config::default();
    let api_key = std::env::var("OPENAI_API_KEY")?;

    let mut service = TranscriptionService::new(config, api_key)?;
    let (mut receiver, _stream) = service.start().await?;

    // Process transcription events
    while let Some(event) = receiver.recv().await {
        match event {
            TranscriptionEvent::Transcription { chunk_id, text } => {
                println!("Heard: {}", text);
                // Feed this to your AI agent for situational awareness
            }
            TranscriptionEvent::Error { chunk_id, error } => {
                eprintln!("Error: {}", error);
            }
        }
    }

    Ok(())
}

AI Agent Integration

The library is designed to give AI agents "ears" - the ability to perceive and respond to their audio environment. Check out the examples:

  • examples/ai_agent.rs - Basic AI agent with audio awareness
  • examples/streaming_agent.rs - Advanced agent with temporal context

Run examples with:

OPENAI_API_KEY=sk-... cargo run --example ai_agent

Usage as a CLI

# With OpenAI or any endpoint requiring auth
OPENAI_API_KEY=sk-... cargo run -- vtt.config.json

# With a local server that does not require auth
cargo run -- vtt.config.json
  • Omit the CLI argument to let the tool load vtt.config.json from the current directory if it exists, otherwise it runs with defaults.
  • Transcripts are printed live and, when out_file is set, appended to that file in addition to the console output.

Features

  • Real-time transcription: Continuously captures and transcribes audio
  • Event-driven API: React to transcriptions as they happen
  • Configurable chunking: Adjust audio chunk duration for your needs
  • OpenAI compatible: Works with OpenAI Whisper and compatible APIs
  • Async/await: Built on Tokio for efficient async processing
  • Type-safe: Strongly typed events and configuration
Commit count: 0

cargo fmt