benchmark

Crates.io	benchmark
lib.rs	benchmark
version	0.8.0
created_at	2025-03-18 16:19:38.789649+00
updated_at	2025-09-04 23:02:44.556004+00
description	Nanosecond-precision benchmarking for dev, testing, and production. Zero-overhead core timing when disabled; optional std-powered collectors and zero-dependency metrics (Watch/Timer) for real service observability.
homepage	https://github.com/jamesgober/rust-benchmark
repository	https://github.com/jamesgober/rust-benchmark
max_upload_size
id	1596775
size	359,075

James Gober (jamesgober)

documentation

https://docs.rs/benchmark

README

Benchmark
_{^{RUST LIBRARY}}

Nanosecond-precision benchmarking for development, testing, and production. The core timing path is zero-overhead when disabled, while optional, std-powered collectors and zero-dependency metrics (Watch/Timer and the stopwatch! macro) provide real service observability with percentiles in production. Designed to be embedded in performance-critical code without bloat or footguns.

What is Benchmark?

Benchmark is a comprehensive benchmarking suite with an advanced performance metrics module that functions in two distinct contexts:

1: As a Development Tool:

The Benchmarking Suite is a statistical benchmarking framework for performance testing and optimization during development. This suite provides the tools for statistical microbenchmarking and comparative analysis, allowing you to detect performance regressions in CI and make data-driven implementation choices during development.

Benchmarking Suite Features

A lightweight, ergonomic alternative to Criterion for statistical performance testing:

Micro-Benchmarking: Test isolated code blocks with the benchmark_block! macro.
Macro-Benchmarking: Test entire functions or modules with the benchmark! macro.
Comparative Analysis: A/B test multiple implementations to make data-driven optimization decisions.
Statistical Sampling: Runs code repeatedly to generate robust performance statistics.
CI/CD Integration: Detect performance regressions in your deployment pipeline.
Load Testing: Simulate realistic traffic patterns and stress test your code.

— See Benchmark Documentation for more information.

2: As a Production Tool:

The performance metrics module serves as an advanced, lightweight application performance monitoring (APM) tool that provides seemless production observability. It allows you to instrument your application to capture real-time performance metrics for critical operations like database queries and API response times. Each timing measurement can be thought of as a span, helping you identify bottlenecks in a live system.

Performance Metrics Features

Low-overhead instrumentation for live application monitoring:

Code Instrumentation: Add performance timers to production code with minimal overhead!
Distributed Tracing: Break down complex operations into spans (database queries, API calls, etc.).
Real-time Metrics: Capture every operation's performance data, not statistical averages.
Health Check Metrics: Monitor TTFB, response times, and system health.
APM Integration: Core observability tooling for application performance monitoring.

— See Metrics Documentation for more information.

Features

True Zero-Overhead: When disabled via default-features = false, all benchmarking code compiles away completely, adding zero bytes to your binary and zero nanoseconds to execution time.
No Dependencies: Built using only the Rust standard library, eliminating dependency conflicts and keeping compilation times fast.
Thread-Safe by Design: Core measurement functions are pure and inherently thread-safe, with an optional thread-safe Collector for aggregating measurements across threads.
Async Compatible: Works seamlessly with any async runtime (Tokio, async-std, etc.) without special support or additional features - just time any expression, sync or async.
Nanosecond Precision: Uses platform-specific high-resolution timers through std::time::Instant, providing nanosecond-precision measurements with monotonic guarantees.
Simple Statistics: Provides essential statistics (count, total, min, max, mean) without complex algorithms or memory allocation, keeping the library focused and efficient.
Production Metrics (optional): Enable the metrics feature for a thread-safe Watch, Timer (auto record on drop), and stopwatch! macro using a built-in zero-dependency histogram for percentiles.
Minimal API Surface: Just four functions and two macros - easy to learn, hard to misuse, and unlikely to ever need breaking changes.
Cross-Platform: Consistent behavior across Linux, macOS, Windows, and other platforms supported by Rust's standard library.

Feature Flags

none: no features.
std (default): uses Rust standard library; disables no_std
benchmark (default): enables default benchmark measurement.
metrics (optional): production/live metrics (Watch, Timer, stopwatch!).
default: convenience feature equal to std + benchmark
standard: convenience feature equal to std + benchmark + metrics
minimal: minimal build with core timing only (no default features)
all: Activates all features (includes: std + benchmark + metrics)

— See FEATURES DOCUMENTATION for more information.

Usage:

Installation

Add this to your Cargo.toml:

[dependencies]
benchmark = "0.8.0"

Standard Features

Enables all standard benchmark features.

[dependencies]

# Enables Production & Development.
benchmark = { version = "0.8.0", features = ["standard"] }

Production metrics (std + metrics)

Enable production observability using Watch/Timer or the stopwatch! macro.

Cargo features:

[dependencies]
benchmark = { version = "0.8.0", features = ["std", "metrics"] }

Record with Timer (auto-record on drop):

use benchmark::{Watch, Timer};

let watch = Watch::new();
{
    let _t = Timer::new(watch.clone(), "db.query");
    // ... do the work to be measured ...
} // recorded once on drop

let s = &watch.snapshot()["db.query"];
assert!(s.count >= 1);

Or use the stopwatch! macro:

use benchmark::{Watch, stopwatch};

let watch = Watch::new();
stopwatch!(watch, "render", {
    // ... work to measure ...
});
assert!(watch.snapshot()["render"].count >= 1);

Disable Default Features

True zero-overhead core timing only.

[dependencies]
# Disable default features for true zero-overhead
benchmark = { version = "0.8.0", default-features = false }

— See FEATURES DOCUMENTATION for more information.

Quick Start

See also: Async Usage · Disabled Mode Behavior · Production Metrics

Measure a closure

Use measure() to time any closure and get back (result, Duration).

use benchmark::measure;

let (value, duration) = measure(|| 2 + 2);
assert_eq!(value, 4);
println!("took {} ns", duration.as_nanos());

Time an expression with the macro

time! works with any expression and supports async contexts.

use benchmark::time;

let (value, duration) = time!({
    let mut sum = 0;
    for i in 0..10_000 { sum += i; }
    sum
});
assert!(duration.as_nanos() > 0);

Micro-benchmark a code block

Use benchmark_block! to run a block many times and get raw per-iteration durations.

use benchmark::benchmark_block;

// Default 10_000 iterations
let samples = benchmark_block!({
    // hot path
    std::hint::black_box(1 + 1);
});
assert_eq!(samples.len(), 10_000);

// Explicit iterations
let samples = benchmark_block!(1_000usize, {
    std::hint::black_box(2 * 3);
});

Macro-benchmark a named expression

Use benchmark! to run a named expression repeatedly and get (last, Vec<Measurement>).

use benchmark::benchmark;

// Default 10_000 iterations
let (last, ms) = benchmark!("add", { 2 + 3 });
assert_eq!(last, Some(5));
assert_eq!(ms[0].name, "add");

// Explicit iterations
let (_last, ms) = benchmark!("mul", 77usize, { 6 * 7 });
assert_eq!(ms.len(), 77);

Disabled mode (`default-features = false`): `benchmark_block!` runs once and returns `vec![]`; `benchmark!` runs once and returns `(Some(result), vec![])`.

Named timing + Collector aggregation (std + benchmark)

Record a named measurement and aggregate stats with Collector.

use benchmark::{time_named, Collector};

fn work() {
    std::thread::sleep(std::time::Duration::from_millis(1));
}

let collector = Collector::new();
let (_, m) = time_named!("work", work());
collector.record(&m);

let stats = collector.stats("work").unwrap();
println!(
    "count={} total={}ns min={}ns max={}ns mean={}ns",
    stats.count,
    stats.total.as_nanos(),
    stats.min.as_nanos(),
    stats.max.as_nanos(),
    stats.mean.as_nanos()
);

Async timing with `await`

The macros inline Instant timing when features = ["std", "benchmark"], so awaiting inside works seamlessly.

use benchmark::time;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let sleep_dur = std::time::Duration::from_millis(10);
    let ((), d) = time!(tokio::time::sleep(sleep_dur).await);
    println!("slept ~{} ms", d.as_millis());
}

Benchmarks

This repository includes Criterion benchmarks that measure the overhead of the public API compared to a direct Instant::now() baseline.

Performance Baselines & CI

Baseline comparison is integrated into the Perf workflow (.github/workflows/perf.yml) and summarizes results in the GitHub Step Summary. The comparator script scripts/compare_criterion_baseline.sh checks current Criterion outputs against JSON baselines in perf_baselines/*.json with per-key tolerances.

Lenient by default: comparisons run with PERF_COMPARE_STRICT=0 so regressions are reported but CI passes.
Strict gate (optional): set PERF_COMPARE_STRICT=1 on a step to make regressions fail the job.
Layouts supported: both nested (target/criterion/<group>/.../new/estimates.json) and flat (target/criterion/.../new/estimates.json).

Local run:

cargo bench -F perf-tests --bench watch_timer_hot -- --measurement-time 5 --warm-up-time 2 --save-baseline current
bash scripts/compare_criterion_baseline.sh watch_timer_hot perf_baselines/watch_timer_hot.json

Safety & Edge Cases

Zero durations: Some operations can complete so fast that elapsed() may be 0ns on some platforms. The API preserves this and returns Duration::ZERO. If you need a floor for visualization, clamp in your presentation layer.
Saturating conversions: Watch::record_instant() converts as_nanos() to u64 using saturating semantics, avoiding panics on extremely large values.
Range clamping: Watch::record() clamps input to histogram bounds, ensuring valid recording and protecting against out-of-range values.
Percentile input clamping: Percentile queries (e.g., Histogram::percentile(q)) clamp q to [0.0, 1.0]. Out-of-range inputs map to min/max (e.g., -0.1 → 0.0, 1.2 → 1.0).
Drop safety: Timer records exactly once. It guards against double-record by storing Option<Instant> and recording on Drop even during unwinding.
Empty datasets: Watch::snapshot() and Collector handle empty sets defensively. Snapshots for empty histograms return zeros; Collector::stats() returns None for missing keys.
Overflow protection: Collector uses saturating_add for total accumulation and 128-bit nanosecond storage in Duration to provide ample headroom.
Thread safety: All shared structures use RwLock with short hold-times: clone under read lock, compute outside the lock. Methods recover from lock poisoning (e.g., unwrap_or_else(|e| e.into_inner())) to avoid panics in production.
HDR backend safety (feature hdr): HDR histogram initialization uses a non-panicking path. If construction with fixed bounds fails at runtime, we trigger a debug_assert! in debug builds and fall back to a safe histogram configuration in release builds to maintain availability.
Feature gating: Production metrics are gated behind features=["std","metrics"]. Disable default features to make all timing a no-op for zero-overhead builds.

How to run

cargo bench

Zero-overhead Proof

This project includes automated checks that verify zero bytes and zero time when disabled, and provide assembly and runtime comparison artifacts.

Size comparison (CI): Workflow .github/workflows/ci.yml, job Zero Overhead builds the crate twice and compares libbenchmark.rlib sizes with and without benchmark. Any unexpected growth fails the job.
Assembly artifacts (CI): Job Assembly Inspection emits objdump/llvm-objdump symbol tables and disassembly for enabled vs disabled. Download the assembly-artifacts artifact from the run to inspect.
Runtime comparison (CI): Bench workflow .github/workflows/bench.yml runs the example examples/overhead_compare.rs in both modes and uploads overhead-compare artifacts with raw output.
Compile tests (trybuild): tests/trybuild_disabled.rs ensures disabled mode compiles with the macro and function APIs used in a minimal program.
Local reproduction:
- Disabled: cargo run --release --example overhead_compare --no-default-features
- Benchmark: cargo run --release --example overhead_compare --no-default-features --features "std benchmark"

When features are disabled (default-features = false), timing returns Duration::ZERO and produces no runtime cost (e.g., time_macro_ns=0 on CI-hosted runners), validating the zero-overhead guarantee.

Sample results (illustrative)

Results below are from a recent run on GitHub-hosted Linux runners; your numbers will vary by hardware and load.

Overhead -------- instant_now_elapsed ~ 81–89 ns measure_closure_add ~ 79–81 ns time_macro_add ~ 81–82 ns Collector Statistics stats::single/1000 ~ 2.44–2.61 µs stats::single/10000 ~ 26.7–29.1 µs stats::all/k10_n1000 ~ 29.4–33.6 µs stats::all/k50_n1000 ~ 148.6–157.1 µs Array Baseline (no locks)

stats::array/k1_n10000 ~ 17.15–17.90 µs stats::array/k10_n1000 ~ 15.03–16.25 µs

Interpretation

Macro/function overhead is on par with direct Instant usage for trivial work, as expected.
Collector stats scale roughly linearly with the number of samples; costs are dominated by iteration and min/max/accumulate.
No-lock array baseline provides a lower bound for aggregation cost; the difference vs Collector indicates lock and map overhead.
Use --features std,benchmark to ensure the benchmark timing path is used.

Benchmarks included

overhead::instant: Instant::now().elapsed() baseline.
overhead::measure: measure(|| expr).
overhead::time_macro: time!(expr).
Bench source: benches/overhead.rs.
Criterion version: 0.5.
Note: when features are disabled (default-features = false), measurement returns Duration::ZERO.
Tip: use --features std,benchmark to ensure the benchmark path for benches.
Environment affects results; run on a quiet system with performance governor if possible.

Sample output (placeholder)

overhead::instant/instant_now_elapsed time: [X ns .. Y ns .. Z ns] overhead::measure/measure_closure_add time: [X ns .. Y ns .. Z ns]

overhead::time_macro/time_macro_add time: [X ns .. Y ns .. Z ns]

Documentation:

API Reference Complete documentation and examples.
Production Metrics Watch, Timer, and stopwatch!.
Code Principles guidelines for contribution & development.

MSRV & SemVer Policy

MSRV: 1.70+ (as indicated by the badge). We bump MSRV only in minor releases and document it in the changelog.
SemVer:
- Patch (x.y.z): bug fixes, internal improvements, and documentation.
- Minor (x.y.z): new features and non-breaking API additions.
- Major (x.y.z): breaking changes only, announced in the changelog with migration notes.
Feature flags: additive and stable. Disabled paths remain zero-cost.
No unsafe in hot paths; any future "unsafe" will be justified and tested.
Docs.rs: examples note required feature flags to avoid confusion.

DEVELOPMENT & CONTRIBUTION

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

⚖️ License

Licensed under the Apache License, version 2.0 (the "License"); you may not use this software, including, but not limited to the source code, media files, ideas, techniques, or any other associated property or concept belonging to, associated with, or otherwise packaged with this software except in compliance with the License.

You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the LICENSE file included with this project for the specific language governing permissions and limitations under the License.

Commit count: 72