fasta-stats

Crates.iofasta-stats
lib.rsfasta-stats
version0.3.1
sourcesrc
created_at2024-07-08 16:34:11.759715
updated_at2024-07-12 03:14:39.266079
descriptionSimple descriptive statistics on FASTA (biological sequence) data
homepage
repository
max_upload_size
id1296144
size40,840
Noah Daniels (ndaniels)

documentation

README

fasta-stats

Compute simple descriptive statistics on a FASTA file

Usage

Simple descriptive statistics on FASTA (biological sequence) data

Usage: fasta-stats [OPTIONS] [FILE]

Arguments:
  [FILE]

Options:
  -m, --median
  -d, --stddev
  -s, --sample <SAMPLE>
      --hint <SIZE_HINT>
  -h, --help              Print help
  -V, --version           Print version

By default, this uses a streaming approach to compute mean, min, max, and count. Minimal memory should be required.

If the median or stddev flags are present, more memory will be required as streaming isn't possible. In order to minimize memory usage, the sample argument can be specified; it is interpreted as "1 in n", as in, if --sample 100 is provided, then an expected 1 in 100 samples will be stored in a vector for purposes of these calculations. Larger values of sample will result in lower memory usage but less-accurate computations.

This simple program expects to read FASTA data either on STDIN or from a named file, and will output the total number of sequences, as well as the min, max, mean, and optionally median and standard deviation, of the sequence lengths. If you have a compressed FASTA file, you can pipe it through zcat or gunzip to decompress it on the fly.

Commit count: 0

cargo fmt