howfast

Crates.iohowfast
lib.rshowfast
version0.1.1
created_at2025-09-23 18:55:55.994375+00
updated_at2025-09-23 19:03:19.581738+00
description Small CLI tool that measures token metrics and completion tokens per second for an Ollama model response
homepagehttps://github.com/spinualexandru/howfast
repositoryhttps://github.com/spinualexandru/howfast
max_upload_size
id1851971
size93,087
Alexandru Spînu (spinualexandru)

documentation

README

howfast

A small CLI tool that measures token metrics (prompt tokens, completion tokens, total tokens, and tokens-per-second) for an Ollama model response. It queries an Ollama server, then prints a nicely formatted, colored summary. The actual model text response is hidden by default; pass --with-response to show it.

Demo


Installation

cargo install howfast

Features

  • Call an Ollama model and collect token metrics.
  • See Tokens per second (completion)
  • Nicely formatted, colored terminal output (no box/borders).
  • Response hidden by default; opt-in with --with-response.
  • Uses Tokio runtime (required by ollama-rs).

Requirements

  • Rust (1.70+ recommended)
  • Cargo
  • An Ollama server reachable from your machine (default localhost:11434)

Environment

  • OLLAMA_HOST (optional) — host address of the Ollama server (defaults to localhost). Do not include the port; the program always uses port 11434.

Example:

# set before running, if your Ollama server isn't on localhost
export OLLAMA_HOST=192.168.1.100

or

OLLAMA_HOST=192.168.1.100 howfast gemma3:4b "Tell me a joke"
Commit count: 6

cargo fmt