randlines

Crates.iorandlines
lib.rsrandlines
version0.1.3
sourcesrc
created_at2020-11-18 20:51:58.382676
updated_at2020-11-19 00:48:48.421831
descriptionSimilar to shuf(1), but probabilistic and minimalistic with respect to memory
homepagehttps://github.com/miku/randlines
repository
max_upload_size
id313774
size15,468
Martin Czygan (miku)

documentation

README

randlines

crates.io

Print out random number of lines from a line oriented file. Pick up where shuf gets killed.

Installation

$ cargo install randlines

Usage

$ randlines -h
randlines 0.1.1

Emit a random subset of lines from a file. This is a probabilistic program, you
will not get exactly `n` lines.

Typically, you can use shuf(1) which uses reservoir sampling and is very
efficient. However, if we want to extract 10M random lines from a file of 100M
lines, shuf(1) might be killed. However, randlines will not shuffle lines, just
skip over random number of lines.

USAGE:
    randlines [OPTIONS] [input]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -n <n>                          [default: 16]
    -s, --size-hint <size-hint>

ARGS:
    <input>

Emit a random subset of lines from a file. This is a probabilistic program, you will not get exactly n lines.

Typically, you can use shuf(1) which uses reservoir sampling and is very efficient. However, if we want to extract 10M random lines from a file of 100M lines, shuf(1) might be killed. However, randlines will not shuffle lines, just skip over random number of lines.

TODO

  • compress temporary output when reading from stdin
Commit count: 0

cargo fmt