# fasta-stats Compute simple descriptive statistics on a FASTA file ## Usage ``` Simple descriptive statistics on FASTA (biological sequence) data Usage: fasta-stats [OPTIONS] [FILE] Arguments: [FILE] Options: -m, --median -d, --stddev -s, --sample --hint -h, --help Print help -V, --version Print version ``` By default, this uses a streaming approach to compute mean, min, max, and count. Minimal memory should be required. If the `median` or `stddev` flags are present, more memory will be required as streaming isn't possible. In order to minimize memory usage, the `sample` argument can be specified; it is interpreted as "1 in n", as in, if `--sample 100` is provided, then an expected 1 in 100 samples will be stored in a vector for purposes of these calculations. Larger values of `sample` will result in lower memory usage but less-accurate computations. This simple program expects to read FASTA data either on STDIN or from a named file, and will output the total number of sequences, as well as the min, max, mean, and optionally median and standard deviation, of the sequence lengths. If you have a compressed FASTA file, you can pipe it through `zcat` or `gunzip` to decompress it on the fly.