armchair

Crates.ioarmchair
lib.rsarmchair
version0.3.0
created_at2025-10-16 08:05:26.91467+00
updated_at2026-01-21 08:57:39.554605+00
descriptionConcurrency benchmarking tool for Rime TTS services
homepage
repository
max_upload_size
id1885600
size123,636
Ryan Li (ryanli)

documentation

README

Armchair

Armchair is a load test binary that can be used to benchmark Rime's TTS service with concurrent requests.

Primary use cases:

  • To find the time-to-first-byte (TTFB) and real-time factor (RTF) for a given concurrency level.
  • To find the maximum concurrency that satisfy the given performance targets.
  • To find an optimal client-side buffer size to avoid underrun issues.

For audio streaming with concurrent sessions, TTFB and RTF are the key performance indicators. To achieve real-time streaming, it is imperative that RTF is under 1 and a maximum concurrency is typically imposed to ensure that RTF is under 1.

Supported features:

  • Bisection to find maximum concurrency based on configurable performance target (success, TTFB, RTF)
  • Time-to-first byte and RTF metrics for a given concurrency
  • Session start staggering via exponential distribution
  • Intra-session delays via truncated normal distribution with playback-aware waiting
  • Client-side buffer simulation and underrun detection

Methodology

This tool simulates many concurrent streaming sessions and evaluates performance against a configurable target.

  • Session model:

    • At a given concurrency C, C sessions are launched.
    • Session starts are staggered by an exponential inter-arrival process with rate λ (--session-rate, starts/second).
    • Each session performs -n/--requests-per-session sequential requests.
  • Per-request timing and metrics:

    • TTFB is measured from request send to the first received byte.
    • Elapsed is the total time to stream the entire response.
    • Audio duration is parsed from the WAV headers; if parsing fails the request is treated as non-audio for RTF purposes.
    • RTF is computed as (elapsed − TTFB) / audio_duration. RTF values requiring missing/invalid audio duration are excluded from RTF percentiles.
  • Intra-session delay model (traffic shaping):

    • After each request completes, the tool waits any remaining playback time if the audio was synthesized faster than real time, i.e. max(0, audio_duration − (elapsed − TTFB)).
    • Then it sleeps an additional delay sampled from a Normal distribution with parameters --intra-session-delay-mu and --intra-session-delay-sigma, truncated to [--intra-session-delay-min, --intra-session-delay-max].
    • The first request in a session has no intra-session delay; session start staggering is controlled by the exponential process above.
  • Buffer underrun detection:

    • Simulates a client-side buffer of size --client-buffer (default 0ms).
    • Playback starts once the buffer is full.
    • An underrun occurs if the buffer empties before playback completes.
    • Requires valid WAV headers to determine the byte rate.
  • Aggregation and statistics:

    • Success is counted when HTTP status is 2xx and the body is non-empty.
    • Percentiles (p50/p90/p95/p99) are computed via linear interpolation over sorted samples; NaN/invalid values are excluded from the relevant metric’s distribution.
    • A startup config dump prints all key parameters for reproducibility.
  • Performance target evaluation (--target):

    • The target is a conjunction: all configured clauses must pass.
    • Supported clauses: success:<fraction>, ttfb:pXX@<duration>, rtf:pXX@<value>, underrun:<fraction>.
    • If a metric cannot be computed (e.g., no valid audio for RTF), that clause fails.
    • Results show OK/FAIL per clause, with color when the terminal supports it.
  • Maximum concurrency search (when --concurrency is omitted):

    • Exponential growth: repeatedly doubles concurrency (1, 2, 4, …) until the performance target fails; waits 10s between trials.
    • Binary search: bisection between last known-good and first failing to find the largest concurrency that still satisfies the target.
    • After discovery, a final run at the chosen concurrency prints a full summary.

Note: The traffic and delay processes are stochastic; repeated runs will vary. Randomness is seeded from system entropy.

Usage

Installation

cargo install armchair

Maximum concurrency

To find the maximum concurrency where each session sends 5 requests with:

  • session starts following an exponential process (lambda=5 starts/sec)
  • intra-session delays sampled from a truncated normal N(mu=10s, sigma=5s), clamped to [0s, 20s]
  • performance targets success:1.00,ttfb:p99@1s,rtf:p99@1.00,underrun:0.00 (default)
armchair --url '<RIME_SERVICE>' --token '<RIME_API_KEY>'

The tool should then report metrics like:

=== MAXIMUM CONCURRENCY FOUND: 16 ===

...

----- Summary -----
concurrency: 16
total: 80 success: 80 (100.0%)
Buffer underrun: 0 (0.0%)
TTFB ms: mean=104.4 p50=100.8 p90=117.6 p95=126.2 p99=141.0
Elapsed ms: mean=13924.0 p50=13772.5 p90=16412.8 p95=17065.3 p99=18527.3
RTF: mean=1.067 p50=1.061 p90=1.170 p95=1.208 p99=1.254

Fixed concurrency

By specifying the flag --concurrency, the tool skips the bisection and simply produces the latency metrics.

Request customization

  • -n: Number of requests in each session, e.g. 5
  • --session-rate: Session starts per second following a Poisson distribution for staggered starts, e.g. 5
  • --intra-session-delay-mu: Intra-session delay mean, e.g. 10s
  • --intra-session-delay-sigma: Intra-session delay standard deviation, e.g. 5s
  • --intra-session-delay-min: Intra-session delay minimum clamp, e.g. 0s
  • --intra-session-delay-max: Intra-session delay maximum clamp, e.g. 20s
  • --client-buffer: Client-side initial playback buffer, e.g. 100ms
  • --prepend-request-id: If set, prepend req-x-y to the request text to avoid cache hits (default: false)
  • --target: Performance target specification, e.g. success:1.00,ttfb:p90@500ms,rtf:p90@1.00,underrun:0.00
  • --percentiles: List of percentiles to report, e.g. 1,25,50,90,99

Duration value syntax

Flags that accept durations (e.g., --intra-session-delay-mu) take values with units:

500ms, 1.5s, 10s

Performance target flag

--target accepts a comma-separated list:

success:<fraction>,ttfb:pXX@<duration>,rtf:pXX@<value>,underrun:<fraction>

Examples:

--target success:0.99,ttfb:p95@800ms,rtf:p90@1.20,underrun:0.01
--target success:1.00,ttfb:p90@1s,rtf:p90@1.00,underrun:0.00

Input text selection

Armchair chooses the request text in the following order:

  • --text: use this text for every request
  • --inputs <PATH>: read PATH and pick a non-empty line uniformly at random per request
  • If neither is provided, use the built-in text pool

Note: --text and --inputs are mutually exclusive.

Dump all request bodies

Use --dump <DIR> to dump the response body for every request.

  • Armchair will create a subdirectory named armchair-YYYYMMDD-HHmmss (UTC) inside <DIR>.
  • Each request is written as:
    • req-x-y.in containing the request text
    • req-x-y.wav if the request is successful
    • req-x-y.out if the request is an error (HTTP error or request failure)
Commit count: 0

cargo fmt