# fasta-stats
Compute simple descriptive statistics on a FASTA file

## Usage

```
Simple descriptive statistics on FASTA (biological sequence) data

Usage: fasta-stats [OPTIONS] [FILE]

Arguments:
  [FILE]

Options:
  -m, --median
  -d, --stddev
  -s, --sample <SAMPLE>
      --hint <SIZE_HINT>
  -h, --help              Print help
  -V, --version           Print version
```

By default, this uses a streaming approach to compute mean, min, max, and count. Minimal memory should be required.

If the `median` or `stddev` flags are present, more memory will be required as streaming isn't possible. In order to minimize memory usage, the `sample` argument can be specified; it is interpreted as "1 in n", as in, if `--sample 100` is provided, then an expected 1 in 100 samples will be stored in a vector for purposes of these calculations. Larger values of `sample` will result in lower memory usage but less-accurate computations.

This simple program expects to read FASTA data either on STDIN or from a named file, and will output the total number of sequences, as well as the min, max, mean, and optionally median and standard deviation, of the sequence lengths. If you have a compressed FASTA file, you can pipe it through `zcat` or `gunzip` to decompress it on the fly.