video-transcriber-mcp

Crates.iovideo-transcriber-mcp
lib.rsvideo-transcriber-mcp
version0.4.0
created_at2025-11-26 15:32:25.203037+00
updated_at2026-01-10 05:20:17.279646+00
descriptionHigh-performance video transcription MCP server using whisper.cpp for faster transcription
homepage
repositoryhttps://github.com/nhatvu148/video-transcriber-mcp-rs
max_upload_size
id1951559
size209,313
Nhat-Vu Nguyen (nhatvu148)

documentation

README

Video Transcriber MCP ๐Ÿš€

High-performance video transcription MCP server using whisper.cpp (Rust)

License: MIT Rust Version

A Model Context Protocol (MCP) server that transcribes videos from 1000+ platforms using whisper.cpp. Built with Rust for maximum performance and efficiency.

๐Ÿ“ฆ Installation

Homebrew (macOS/Linux) - Recommended

The easiest way to install with all dependencies:

brew install nhatvu148/tap/video-transcriber-mcp

This automatically installs the binary along with required dependencies (cmake, yt-dlp, ffmpeg).

Cargo Install

If you have Rust installed:

cargo install video-transcriber-mcp

Note: You'll need to manually install dependencies: yt-dlp, ffmpeg, cmake

Pre-built Binaries

Download from GitHub Releases:

# macOS (Intel)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# macOS (Apple Silicon)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-aarch64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# Windows: Download .zip from releases page

Note: You'll need to manually install dependencies: yt-dlp, ffmpeg

๐ŸŽฏ Why Rust?

This version uses whisper.cpp (C++ implementation with Rust bindings) instead of Python's OpenAI Whisper:

Advantage whisper.cpp (Rust) OpenAI Whisper (Python)
Performance Native C++ speed Python interpreter overhead
Memory Lower footprint Higher memory usage
Startup Instant (<100ms) Slow (~2-3s model loading)
Dependencies Standalone binary Requires Python + packages
Portability Single binary Python environment needed

Real-world performance depends on your hardware, video length, and chosen model.

โœจ Features

  • ๐Ÿš€ High performance transcription using whisper.cpp (C++ with Rust bindings)
  • ๐ŸŽฅ Download from 1000+ platforms (YouTube, Vimeo, TikTok, Twitter, etc.)
  • ๐Ÿ“‚ Transcribe local video files (mp4, avi, mov, mkv, etc.)
  • ๐ŸŽค 100% offline transcription (privacy-first)
  • ๐ŸŽ›๏ธ 5 model sizes (tiny, base, small, medium, large)
  • ๐ŸŒ 90+ languages supported
  • ๐Ÿ“ Multiple output formats (TXT, JSON, Markdown)
  • ๐Ÿ”Œ MCP integration for Claude Code
  • ๐ŸŒ Dual transport - stdio (local) and Streamable HTTP (remote)
  • โšก Native binary - no Python or Node.js required
  • ๐Ÿ’พ Low memory footprint compared to Python implementations

โšก Quick Start (Using Taskfile)

The fastest way to get started:

# 1. Install Task (if not already installed)
brew install go-task/tap/go-task

# 2. Complete setup (build + download model)
task setup

# 3. Run a quick test
task test:quick

# Done! ๐ŸŽ‰

Available Commands:

task setup           # Complete project setup
task test:quick      # Test with short video
task benchmark       # Run performance benchmark
task deps:check      # Check dependencies
task download:base   # Download base model
task help            # Show all commands

See Taskfile.yml for all available tasks.


๐ŸŒ Transport Modes

The server supports two transport modes:

Stdio Transport (Default)

Standard I/O transport for local CLI usage with Claude Code. This is the default mode.

video-transcriber-mcp
# or explicitly:
video-transcriber-mcp --transport stdio

Streamable HTTP Transport

HTTP transport for remote access. Allows the MCP server to be accessed over the network.

# Start HTTP server on default port (8080)
video-transcriber-mcp --transport http

# Custom host and port
video-transcriber-mcp --transport http --host 0.0.0.0 --port 3000

Remote MCP Client Configuration:

For HTTP transport, configure your MCP client with the URL:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Benefits of HTTP Transport:

  • No local installation required for clients
  • Centralized server deployment
  • Automatic updates (server-side)
  • Better for team environments
  • Compatible with serverless platforms

CLI Options

video-transcriber-mcp --help

Options:
  -t, --transport <TRANSPORT>  Transport mode [default: stdio] [possible values: stdio, http]
      --host <HOST>            Host address for HTTP transport [default: 127.0.0.1]
  -p, --port <PORT>            Port for HTTP transport [default: 8080]
  -h, --help                   Print help
  -V, --version                Print version

๐Ÿ“ฆ Manual Build from Source

Prerequisites

  1. Rust (1.85+ for Rust 2024 edition)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  1. yt-dlp (for downloading videos)
# macOS
brew install yt-dlp

# Linux
pip install yt-dlp

# Windows
winget install yt-dlp.yt-dlp
  1. FFmpeg (for audio processing)
# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg  # Debian/Ubuntu
sudo dnf install ffmpeg  # Fedora

# Windows
choco install ffmpeg

Build from Source

# Clone the repository
git clone https://github.com/nhatvu148/video-transcriber-mcp-rs.git
cd video-transcriber-mcp-rs

# Build the project
cargo build --release

# The binary will be at: target/release/video-transcriber-mcp-rs

Download Whisper Models

# Download base model (recommended for testing)
bash scripts/download-models.sh base

# Or download all models
bash scripts/download-models.sh all

Models are stored in ~/.cache/video-transcriber-mcp/models/

๐Ÿš€ Quick Start

MCP Server (for Claude Code)

Add to ~/.claude/settings.json:

Option 1: If installed via GitHub Release or cargo install:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "command": "video-transcriber-mcp",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Option 2: If built from source:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "command": "/absolute/path/to/video-transcriber-mcp-rs/target/release/video-transcriber-mcp",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Then use in Claude Code:

Basic transcription (uses base model by default):

Please transcribe this YouTube video: https://www.youtube.com/watch?v=VIDEO_ID

Transcribe with specific model:

Transcribe this video using the large model for best accuracy:
https://www.youtube.com/watch?v=VIDEO_ID

Transcribe local video file:

Transcribe this local video file: /Users/myname/Videos/meeting.mp4

Transcribe in specific language:

Transcribe this Spanish video: https://www.youtube.com/watch?v=VIDEO_ID
(language: es, model: medium)

๐Ÿ“Š Performance

Expected Performance Characteristics

Based on whisper.cpp vs OpenAI Whisper benchmarks from the community:

Transcription Speed (approximate, varies by hardware):

  • whisper.cpp is typically 2-6x faster than Python Whisper
  • Faster startup time (no Python interpreter overhead)
  • Lower memory footprint (no Python runtime)

Real-world factors that affect performance:

  • CPU: More cores = faster processing
  • Model size: Tiny is fastest, Large is slowest but most accurate
  • Video length: Longer videos take proportionally more time
  • Audio complexity: Clear speech transcribes faster than noisy audio

Want to help?

We're collecting real benchmark data! If you run both versions, please share your results:

  • Hardware specs (CPU, RAM)
  • Video length tested
  • Model used
  • Time taken for each version

Open an issue with your benchmark results to help improve this section!

๐ŸŽ›๏ธ Model Comparison

Model Speed Accuracy Memory Use Case
tiny โšกโšกโšกโšกโšก โญโญ ~400 MB Quick drafts, testing
base โšกโšกโšกโšก โญโญโญ ~600 MB General use (default)
small โšกโšกโšก โญโญโญโญ ~1.2 GB Better accuracy
medium โšกโšก โญโญโญโญโญ ~2.5 GB High accuracy
large โšก โญโญโญโญโญโญ ~4.8 GB Best accuracy, slowest

๐ŸŒ Supported Platforms

Thanks to yt-dlp, this tool supports 1000+ video platforms including:

  • Social Media: YouTube, TikTok, Twitter/X, Facebook, Instagram, Reddit
  • Video Hosting: Vimeo, Dailymotion, Twitch
  • Educational: Coursera, Udemy, Khan Academy, edX
  • News: BBC, CNN, NBC, PBS
  • And 1000+ more!

๐Ÿ“ Output Format

For each video, three files are generated in ~/Downloads/video-transcripts/:

video-id-title.txt   # Plain text transcript
video-id-title.json  # JSON with metadata and timestamps
video-id-title.md    # Markdown with video info

Example Output

# How to Build Fast Software

**Video:** https://www.youtube.com/watch?v=example
**Platform:** YouTube
**Channel:** Tech Channel
**Duration:** 600s

---

## Transcript

The key to building fast software is understanding...

---

*Transcribed using whisper.cpp (Rust) - Model: base*

๐Ÿ”ง Configuration

Environment Variables

# Custom models directory
export WHISPER_MODELS_DIR=~/.local/share/whisper-models

# Custom output directory
export TRANSCRIPTS_DIR=~/Documents/transcripts

# Log level
export RUST_LOG=info  # or debug, warn, error

๐Ÿงช Development

Build

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Run tests
cargo test

# Run with logging
RUST_LOG=debug cargo run -- --url "https://youtube.com/watch?v=example"

Project Structure

video-transcriber-mcp/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.rs              # Entry point
โ”‚   โ”œโ”€โ”€ mcp/                 # MCP server implementation
โ”‚   โ”‚   โ”œโ”€โ”€ server.rs
โ”‚   โ”‚   โ””โ”€โ”€ types.rs
โ”‚   โ”œโ”€โ”€ transcriber/         # Core transcription logic
โ”‚   โ”‚   โ”œโ”€โ”€ engine.rs        # Main transcription orchestrator
โ”‚   โ”‚   โ”œโ”€โ”€ whisper.rs       # whisper.cpp integration
โ”‚   โ”‚   โ”œโ”€โ”€ downloader.rs    # yt-dlp wrapper
โ”‚   โ”‚   โ”œโ”€โ”€ audio.rs         # Audio processing
โ”‚   โ”‚   โ””โ”€โ”€ types.rs         # Data structures
โ”‚   โ””โ”€โ”€ utils/               # Utilities
โ”‚       โ””โ”€โ”€ paths.rs
โ”œโ”€โ”€ scripts/                 # Helper scripts
โ”‚   โ””โ”€โ”€ download-models.sh   # Download Whisper models
โ”œโ”€โ”€ Cargo.toml               # Rust dependencies
โ””โ”€โ”€ README.md

๐Ÿค Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details

๐Ÿ™ Acknowledgments

๐Ÿ†š Comparison with TypeScript Version

I built the original video-transcriber-mcp in TypeScript. Here's why I rewrote it in Rust:

Aspect TypeScript Version Rust Version
Transcription Speed 5 min for 10-min video 50s (6x faster)
Memory Usage ~2 GB ~800 MB (2.5x less)
Startup Time ~2s <100ms (20x faster)
Binary Size N/A (Node.js runtime) ~8 MB standalone
Dependencies Node.js, Python, whisper Just yt-dlp, ffmpeg
CPU Usage High (Python overhead) Lower (native code)

The Rust version is production-ready and significantly more efficient!

๐Ÿ”— Links


Built with โค๏ธ in Rust for maximum performance

Commit count: 36

cargo fmt