sample-lines

Crates.iosample-lines
lib.rssample-lines
version
sourcesrc
created_at2025-04-12 21:29:36.055246+00
updated_at2025-04-13 22:39:02.203846+00
descriptionCommand-line tool to sample lines from a file or stdin without replacement. It runs in one pass without reading the whole input into memory using reservoir sampling.
homepage
repositoryhttps://github.com/stringertheory/sample
max_upload_size
id1631269
Cargo.toml error:TOML parse error at line 23, column 1 | 23 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
size0
Mike Stringer (stringertheory)

documentation

README

sample-lines

samp is a fast command-line tool to randomly sample lines from a file or standard input using reservoir sampling. It samples uniformly without replacement.

Good for:

  • Downsampling large datasets
  • Sampling logs for debugging
  • Creating reproducible random subsets of data

You can think of samp kind of like head or tail, for example:

head -n 10 < data.txt   # outputs 10 first lines
tail -n 10 < data.txt   # outputs 10 last lines
samp -n 10 < data.txt   # outputs 10 random lines

Installation

If you have Rust installed, you can install samp with:

cargo install sample-lines

Or build it from source:

git clone https://github.com/stringertheory/sample-lines.git
cd sample-lines
cargo build --release

Usage

samp -n <NUM> [--seed <SEED>] [FILE]

Here are a few examples:

samp --help                                   # Show help
cat data.txt | samp -n 10                     # Keep 10 lines, with pipe
samp -n 10 data.txt                           # Giving filename
samp -n 10 < data.txt                         # Standard in
samp -n 10 --seed 17 < data.txt               # Reproducible sample
cat data.csv | samp -n 10 --preserve-headers  # Preserve 1 header line
samp -r 0.01 < big.log                        # Keep ~1% of lines
samp -r 0.10 --seed 17 data.csv -p            # Reproducible 10% sample

Options

Option Description
-n, --number <NUM> Number of lines to sample (required)
-r, --rate <RATE> Sampling rate: probability to include each line (e.g., 0.05)
-s, --seed <SEED> Optional seed for reproducible sampling
-p, --preserve-headers [N] Preserve the first N lines as headers (default: 1 if flag is used)
-h, --help Show help message
--version Show the version number

Testing

cargo clean
cargo build # need binary for testing stdin/stderr
cargo test

License

Licensed under the MIT License.

Contributing

Issues and pull requests welcome! If you have an idea, a feature request, or a bug report, feel free to open an issue or PR.

Commit count: 0

cargo fmt