armchair

Crates.io	armchair
lib.rs	armchair
version	0.3.0
created_at	2025-10-16 08:05:26.91467+00
updated_at	2026-01-21 08:57:39.554605+00
description	Concurrency benchmarking tool for Rime TTS services
homepage
repository
max_upload_size
id	1885600
size	123,636

Ryan Li (ryanli)

documentation

README

Armchair

Armchair is a load test binary that can be used to benchmark Rime's TTS service with concurrent requests.

Primary use cases:

To find the time-to-first-byte (TTFB) and real-time factor (RTF) for a given concurrency level.
To find the maximum concurrency that satisfy the given performance targets.
To find an optimal client-side buffer size to avoid underrun issues.

For audio streaming with concurrent sessions, TTFB and RTF are the key performance indicators. To achieve real-time streaming, it is imperative that RTF is under 1 and a maximum concurrency is typically imposed to ensure that RTF is under 1.

Supported features:

Bisection to find maximum concurrency based on configurable performance target (success, TTFB, RTF)
Time-to-first byte and RTF metrics for a given concurrency
Session start staggering via exponential distribution
Intra-session delays via truncated normal distribution with playback-aware waiting
Client-side buffer simulation and underrun detection

Methodology

This tool simulates many concurrent streaming sessions and evaluates performance against a configurable target.

Session model:
- At a given concurrency C, C sessions are launched.
- Session starts are staggered by an exponential inter-arrival process with rate λ (--session-rate, starts/second).
- Each session performs -n/--requests-per-session sequential requests.
Per-request timing and metrics:
- TTFB is measured from request send to the first received byte.
- Elapsed is the total time to stream the entire response.
- Audio duration is parsed from the WAV headers; if parsing fails the request is treated as non-audio for RTF purposes.
- RTF is computed as (elapsed − TTFB) / audio_duration. RTF values requiring missing/invalid audio duration are excluded from RTF percentiles.
Intra-session delay model (traffic shaping):
- After each request completes, the tool waits any remaining playback time if the audio was synthesized faster than real time, i.e. max(0, audio_duration − (elapsed − TTFB)).
- Then it sleeps an additional delay sampled from a Normal distribution with parameters --intra-session-delay-mu and --intra-session-delay-sigma, truncated to [--intra-session-delay-min, --intra-session-delay-max].
- The first request in a session has no intra-session delay; session start staggering is controlled by the exponential process above.
Buffer underrun detection:
- Simulates a client-side buffer of size --client-buffer (default 0ms).
- Playback starts once the buffer is full.
- An underrun occurs if the buffer empties before playback completes.
- Requires valid WAV headers to determine the byte rate.
Aggregation and statistics:
- Success is counted when HTTP status is 2xx and the body is non-empty.
- Percentiles (p50/p90/p95/p99) are computed via linear interpolation over sorted samples; NaN/invalid values are excluded from the relevant metric’s distribution.
- A startup config dump prints all key parameters for reproducibility.
Performance target evaluation (--target):
- The target is a conjunction: all configured clauses must pass.
- Supported clauses: success:<fraction>, ttfb:pXX@<duration>, rtf:pXX@<value>, underrun:<fraction>.
- If a metric cannot be computed (e.g., no valid audio for RTF), that clause fails.
- Results show OK/FAIL per clause, with color when the terminal supports it.
Maximum concurrency search (when --concurrency is omitted):
- Exponential growth: repeatedly doubles concurrency (1, 2, 4, …) until the performance target fails; waits 10s between trials.
- Binary search: bisection between last known-good and first failing to find the largest concurrency that still satisfies the target.
- After discovery, a final run at the chosen concurrency prints a full summary.

Note: The traffic and delay processes are stochastic; repeated runs will vary. Randomness is seeded from system entropy.

Usage

Installation

cargo install armchair

Maximum concurrency

To find the maximum concurrency where each session sends 5 requests with:

session starts following an exponential process (lambda=5 starts/sec)
intra-session delays sampled from a truncated normal N(mu=10s, sigma=5s), clamped to [0s, 20s]
performance targets success:1.00,ttfb:p99@1s,rtf:p99@1.00,underrun:0.00 (default)

armchair --url '<RIME_SERVICE>' --token '<RIME_API_KEY>'

The tool should then report metrics like:

=== MAXIMUM CONCURRENCY FOUND: 16 ===

...

----- Summary -----
concurrency: 16
total: 80 success: 80 (100.0%)
Buffer underrun: 0 (0.0%)
TTFB ms: mean=104.4 p50=100.8 p90=117.6 p95=126.2 p99=141.0
Elapsed ms: mean=13924.0 p50=13772.5 p90=16412.8 p95=17065.3 p99=18527.3
RTF: mean=1.067 p50=1.061 p90=1.170 p95=1.208 p99=1.254

Fixed concurrency

By specifying the flag --concurrency, the tool skips the bisection and simply produces the latency metrics.

Request customization

-n: Number of requests in each session, e.g. 5
--session-rate: Session starts per second following a Poisson distribution for staggered starts, e.g. 5
--intra-session-delay-mu: Intra-session delay mean, e.g. 10s
--intra-session-delay-sigma: Intra-session delay standard deviation, e.g. 5s
--intra-session-delay-min: Intra-session delay minimum clamp, e.g. 0s
--intra-session-delay-max: Intra-session delay maximum clamp, e.g. 20s
--client-buffer: Client-side initial playback buffer, e.g. 100ms
--prepend-request-id: If set, prepend req-x-y to the request text to avoid cache hits (default: false)
--target: Performance target specification, e.g. success:1.00,ttfb:p90@500ms,rtf:p90@1.00,underrun:0.00
--percentiles: List of percentiles to report, e.g. 1,25,50,90,99

Duration value syntax

Flags that accept durations (e.g., --intra-session-delay-mu) take values with units:

500ms, 1.5s, 10s

Performance target flag

--target accepts a comma-separated list:

success:<fraction>,ttfb:pXX@<duration>,rtf:pXX@<value>,underrun:<fraction>

Examples:

--target success:0.99,ttfb:p95@800ms,rtf:p90@1.20,underrun:0.01
--target success:1.00,ttfb:p90@1s,rtf:p90@1.00,underrun:0.00

Input text selection

Armchair chooses the request text in the following order:

--text: use this text for every request
--inputs <PATH>: read PATH and pick a non-empty line uniformly at random per request
If neither is provided, use the built-in text pool

Note: --text and --inputs are mutually exclusive.

Dump all request bodies

Use --dump <DIR> to dump the response body for every request.

Armchair will create a subdirectory named armchair-YYYYMMDD-HHmmss (UTC) inside <DIR>.
Each request is written as:
- req-x-y.in containing the request text
- req-x-y.wav if the request is successful
- req-x-y.out if the request is an error (HTTP error or request failure)

Commit count: 0