hyprwhspr-rs

Crates.io	hyprwhspr-rs
lib.rs	hyprwhspr-rs
version	0.3.15
created_at	2025-10-21 17:36:13.82949+00
updated_at	2026-01-22 07:42:11.957105+00
description	Native speech-to-text voice dictation for Hyprland (Rust implementation)
homepage	https://github.com/better-slop/hyprwhspr-rs
repository	https://github.com/better-slop/hyprwhspr-rs
max_upload_size
id	1894189
size	561,095

zero (cau1k)

documentation

https://github.com/better-slop/hyprwhspr-rs#readme

README

hyprwhspr-rs

Rust implementation of hyprwhspr | Blazing fast, native speech-to-text voice dictation for Hyprland and Omarchy | Waybar, Walker, and Elephant integrations (optional)

cargo install hyprwhspr-rs

https://github.com/user-attachments/assets/bbbaa1c3-1a7e-4165-ad3d-27b7465e201a

Requirements

whisper.cpp (GitHub, AUR)
libudev + pkg-config (required for hotplug detection; libudev-dev on Debian/Ubuntu)
GNU-only binaries (no musl releases)
Groq or Gemini API key (optional)
- Groq with whisper is cheap (~$0.10 USD/month) and fast as hell. [Data Controls]
- Comparatively, Gemini is very slow but offers better output formatting.
Parakeet TDT (optional) - NVIDIA's local ASR model via ONNX
- Run ./scripts/download-parakeet-tdt.sh to download model files (~1.2GB)
- Very fast, but not as accurate as whisper or Gemini

Features

Fast speech-to-text
Intuitive configuration
- word overrides (many are already baked in)
- multi provider support
- hot reloading during runtime
Optional fast VAD trims (fast_vad.enabled) audio files, reducing inferences costs while increasing output speed

Built for Hyprland

Detects Hyprland via HYPRLAND_INSTANCE_SIGNATURE and opens the IPC socket at $XDG_RUNTIME_DIR/hypr/<signature>/.socket.sock.
Execs dispatch sendshortcut commands against the active window to paste dictated text, inspecting activewindow to decide when Shift is required for a hardcoded list of programs.
Falls back to a Wayland virtual keyboard client or a simulated keypress paste if IPC communication fails.

Installation

From crates.io

Install the latest release from crates.io

cargo install hyprwhspr-rs

Install systemd service and Waybar module (optionally, with a WIP elephant/walker menu using --with-elephant flag)

# Interactive install
hyprwhspr-rs install

# Optionally, install specific components (systemd, waybar, elephant)
hyprwhspr-rs install {--all| --service | --waybar | --elephant} {--force | -f}

From source

git clone https://github.com/better-slop/hyprwhispr-rs.git
cd hyprwhspr-rs
cargo build --release
sudo cp target/release/hyprwhspr-rs /usr/local/bin/

Waybar Integration

./scripts/install-waybar.sh

Installs systemd service, Waybar module, and CSS styles. Shows mic status in your bar.

Development

git clone https://github.com/better-slop/hyprwhispr-rs.git
cd hyprwhspr-rs
cargo build --release
Run using:
- pretty logs: RUST_LOG=debug ./target/release/hyprwhspr-rs
- production release: ./target/release/hyprwhspr-rs

Example config

Configure in ~/.config/hyprwhspr-rs/config.jsonc

{
  "shortcuts": {
    "press": "SUPER+ALT+D",
    "hold": "SUPER+ALT+CTRL",
  },
  "word_overrides": {
    "under score": "_",
    "em dash": "—",
    "equal": "=",
    "at sign": "@",
    "pound": "#",
    "hashtag": "#",
    "hash tag": "#",
    "newline": "\n",
    "Omarkey": "Omarchy",
    "dot": ".",
    "Hyperland": "hyprland",
    "hyperland": "hyprland",
  },
  "audio_feedback": true, // Play start/stop sounds while recording
  "start_sound_volume": 0.1, // 0.1 - 1.0
  "stop_sound_volume": 0.1, // 0.1 - 1.0
  "start_sound_path": null, // Optional custom audio asset overrides
  "stop_sound_path": null, // Optional custom audio asset overrides
  "auto_copy_clipboard": true, // Automatically copy the final transcription to the clipboard
  "shift_paste": false, // Whether to force shift paste
  "global_paste_shortcut": false, // Enable the compositor-level paste shortcut (Omarchy's addition)
  "paste_hints": {
    "shift": [
      // Optional list of Hyprland window classes that should always paste with Ctrl+Shift+V
    ],
  },
  "audio_device": null, // Force a specific input device index (null uses system default)
  "fast_vad": {
    "enabled": false, // Enable Earshot fast VAD trimming
    "profile": "aggressive", // quality | low_bitrate | aggressive | very_aggressive (lowercase only, serde-enforced; default aggressive)
    "min_speech_ms": 120, // Minimum detected speech before keeping a segment
    "silence_timeout_ms": 500, // Drop silence longer than this (ms)
    "pre_roll_ms": 120, // Audio to keep before speech to avoid clipping words
    "post_roll_ms": 150, // Audio to keep after speech before trimming
    "volatility_window": 24, // Frames observed for adaptive aggressiveness (30 ms per frame, matches FRAME_MS in src/audio/vad.rs)
    "volatility_increase_threshold": 0.35, // Bump profile when toggles exceed this ratio
    "volatility_decrease_threshold": 0.12, // Relax profile when toggles stay below this ratio
  },
  "transcription": {
    "provider": "whisper_cpp", // whisper_cpp | groq | gemini | parakeet
    "request_timeout_secs": 45,
    "max_retries": 2,
    "whisper_cpp": {
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
      "model": "large-v3-turbo-q8_0", // Whisper model to use (must exist in specified directories)
      "threads": 4, // CPU threads dedicated to whisper.cpp
      "gpu_layers": 999, // Number of layers to keep on GPU (999 = auto/GPU preferred)
      "fallback_cli": false, // Fallback to whisper-cli (uses CPU)
      "no_speech_threshold": 0.6, // Whisper's "no speech" confidence gate
      "models_dirs": ["~/.config/hyprwhspr-rs/models"], // Directories to search for models
      "vad": {
        "enabled": false, // Toggle whisper-cli's native Silero VAD
        "model": "ggml-silero-v5.1.2.bin", // Path or filename for the ggml Silero VAD model
        // Probability threshold for deciding a frame is speech. Higher = fewer false positives, but may miss quiet speech.
        "threshold": 0.5,
        // Minimum contiguous speech duration (ms) to accept. Increase to ignore quick clicks/taps.
        "min_speech_ms": 250,
        // Minimum silence gap (ms) required to end a speech segment. Raise if mid-sentence pauses are being split.
        "min_silence_ms": 120,
        // Maximum speech duration (seconds) before forcing a cut. Use null (or omit) to leave unlimited.
        "max_speech_s": 15.0,
        // Extra padding (ms) added before/after detected speech so words aren't clipped.
        "speech_pad_ms": 80,
        // Overlap ratio between segments. Higher overlap helps smooth transitions at the cost of a little extra decode time.
        "samples_overlap": 0.1,
      },
    },
    "groq": {
      "model": "whisper-large-v3-turbo",
      "endpoint": "https://api.groq.com/openai/v1/audio/transcriptions",
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
    "gemini": {
      "model": "gemini-2.5-flash-preview-09-2025",
      "endpoint": "https://generativelanguage.googleapis.com/v1beta/models",
      "temperature": 0.0,
      "max_output_tokens": 1024,
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
    "parakeet": {
      "model_dir": "models/parakeet/parakeet-tdt-0.6b-v3-onnx", // Relative to $XDG_DATA_HOME/hyprwhspr-rs (or ~/.local/share/hyprwhspr-rs)
      "prompt": "Transcribe as technical documentation with proper capitalization, acronyms, and technical terminology. Do not add punctuation.",
    },
  },
}

Earshot VAD trimming (optional)

The default build ships with the earshot VoiceActivityDetector baked in. Toggle fast_vad.enabled in your config to trim silence before any provider (whisper.cpp, Groq, Gemini) sees the audio. Extremely useful for lowering costs and increasing speed.

Operates on the 16 kHz PCM emitted by the capture layer and shares the trimmed buffer across all providers.
Drops silent stretches longer than the configured timeout while keeping configurable pre-roll and post-roll padding so word edges remain intact.
Adapts Earshot’s aggressiveness based on recent speech/silence volatility—fewer uploads when the room is noisy.
If an entire recording is silent, the app short-circuits the upload path instead of dispatching an empty request.

All other fields in the fast_vad block map directly to the trimmer’s behaviour, so you can tune aggressiveness without recompiling.

Release process

Runtime builds rely on a local whisper.cpp installation, so validate that dependency before shipping a tagged version.

Use Conventional Commits – fix: bumps patch, feat: bumps minor, and type!: indicates a breaking change (major bump).
On every push to main, the release-plz workflow runs release-pr to open or refresh a release-plz-* pull request. Review the proposed version and changelog there.
When the release PR looks good, merge it. The same workflow runs release-plz release, tagging (vX.Y.Z) and publishing the crate to crates.io if it’s a stable tag.
The tag triggers the release workflow, which builds the Linux GNU binary, uploads the tarball + checksum, and publishes the GitHub release with the full commit list (plus PR links when available).

Define CARGO_REGISTRY_TOKEN in the repository secrets with publish-only permissions so the workflow can push stable releases to crates.io.

To Do

Slop review/clean up
Ship waybar integration
Release on Cargo
Release on AUR
Add support for other operating systems/setups
Refine paste layer
Investigate formatting model

Commit count: 139

hyprwhspr-rs

documentation

README

hyprwhspr-rs

Requirements

Features

Built for Hyprland

Installation

From crates.io

From source

Waybar Integration

Development

To Do

cargo fmt