Crates.io | randlines |
lib.rs | randlines |
version | 0.1.3 |
source | src |
created_at | 2020-11-18 20:51:58.382676 |
updated_at | 2020-11-19 00:48:48.421831 |
description | Similar to shuf(1), but probabilistic and minimalistic with respect to memory |
homepage | https://github.com/miku/randlines |
repository | |
max_upload_size | |
id | 313774 |
size | 15,468 |
Print out random number of lines from a line oriented file. Pick up where shuf gets killed.
$ cargo install randlines
$ randlines -h
randlines 0.1.1
Emit a random subset of lines from a file. This is a probabilistic program, you
will not get exactly `n` lines.
Typically, you can use shuf(1) which uses reservoir sampling and is very
efficient. However, if we want to extract 10M random lines from a file of 100M
lines, shuf(1) might be killed. However, randlines will not shuffle lines, just
skip over random number of lines.
USAGE:
randlines [OPTIONS] [input]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-n <n> [default: 16]
-s, --size-hint <size-hint>
ARGS:
<input>
Emit a random subset of lines from a file. This is a probabilistic program, you
will not get exactly n
lines.
Typically, you can use shuf(1) which uses reservoir sampling and is very efficient. However, if we want to extract 10M random lines from a file of 100M lines, shuf(1) might be killed. However, randlines will not shuffle lines, just skip over random number of lines.