# My (Max's?) Minimal Fasta Toolkit Minimal, simple fasta tools. Each program is self-contained in the `./src/fasta` directory, and follows similar boilerplate code, related to file handling. So if you feel like contributing and/or adding your own subcommand, please do. ## Usage Typing `mmft` (shows subcommands) or `mmft -h` (shows specific subcommand) will show the usage of the tool in question. Commands are added only as and when I need them. If you like what you see, please feel free to contribute a PR with your favourite subcommand. ### Calculations - `mmft len ` or `cat | mmft len`. Calculates lengths of each fasta record. - `mmft gc ` or `cat | mmft gc`. Calculates GC content of each fasta record. - `mmft n50 ` or `cat | mmft n50`. Calculates n50 of a fasta record (or stream of fasta files combined). - `mmft num ` or `cat | mmft num`. Calculates number of sequences, and total number of base pairs in the fasta file input(s). - `mmft revcomp ` or `cat ` or `cat " ` or `cat | mmft regex -r ""`. Extracts fasta records from one or multiple fasta files with headers matching the regex. - `mmft extract -r 1-100 ` or `cat | mmft extract -r 1-100`. Extracts first 100 nucleotides from each fasta record. You can of course choose any range, using a dash to separate the numbers. - `mmft filter -f `. Supply a text file of one ID per line and filter will extract the corresponding fasta records. - `mmft merge `. Will merge multiple fasta files together into the same record. - `mmft sample -n `. Will randomly sample a fasta file (or stream of fasta files) to a specified number of records. - `mmft split (-d ) -n `. Splits fasta into equal chunks with the last chunk the remainder if record number not perfectly divisible by chunk number. Careful when piping into `mmft` as fasta files are not treated separately, they are treated as a continuum of fasta records. Hence, while `mmft n50 1.fasta 2.fasta` shows the n50 of each fasta file separately, `cat *.fasta | mmft n50` will calculate the n50 of both files combined. In addition, `mmft sample` loads the entire STDIN into memory, so be careful when piping large files. Some functions don't support piping (`filter`, `merge`, `sample`, `split`). All printed to STDOUT. ## TODO's I'll add stuff as and when I have time, or they are of use. Maybe: - Simple pattern matching, returning positions. - Potential ORFs - Any kmer stuff? - Testing - Better documentation