| Crates.io | fasta_rs |
| lib.rs | fasta_rs |
| version | 0.0.1 |
| created_at | 2025-10-10 08:02:47.795716+00 |
| updated_at | 2025-10-10 08:02:47.795716+00 |
| description | Multi purpose fasta toolkit |
| homepage | |
| repository | https://github.com/OscarAspelin95/fasta_rs |
| max_upload_size | |
| id | 1876661 |
| size | 110,723 |
Fasta toolkit, aiming to an alternative to seqkit.
Clone the repository or download the source code. Enter the fasta_rs directory and run:
cargo build --release
The generated binary is available in target/release/fasta_rs.
Run with:
fasta_rs <subcommand> <args>
The following command will randomly sample 50% of the sequences, filter by gc content and finally convert to a .tsv file.
fasta_rs sample -b 0.5 < file.fasta | fasta_rs filter --min-gc 0.5 | fasta_rs fa2tab > out.tsv
splitSplit into one file per sequence.
fasta_rs split --fasta <sequences.fasta> <optional_args>
Optional arguments:
-o/--outdir [fasta_split] - Output directory.
statsCalculate basic stats.
fasta_rs stats --fasta <sequences.fasta> <optional_args>
Optional arguments:
-o/--outfile [stdout] - Output file.
fa2tabGenerate a .tsv file with basic information about each sequence.
fasta_rs fa2tab --fasta <sequences.fasta> <optional_args>
Optional arguments:
-o/--outfile [stdout] - Output file.
homopolymersFind homopolymers in sequences.
fasta_rs homopolymers --fasta <sequences.fasta> <optional_args>
Optional arguments:
-m/--min-hp-len [5] - Min homopolymer length to consider.
-s/--strict [false] - Only consider homopolymers for {A, C, G, T, a, c, g, t}.
-o/--outfile [stdout] - Output file.
filterFilter sequences based on certain criteria.
fasta_rs filter --fasta <sequences.fasta> <optional_args>
Optional arguments:
--min-len [0] - Minimum sequence length. --max-len [u64::MAX] - Maximum sequence length. --min-gc [0.0] - Minimum GC content. --max-gc [1.0] - Maximum GC content. --min-ambig [0.0] - Minimum fraction ambiguous bases. --max-ambig [1.0] - Maximum fraction ambiguous bases. --min-softmask [0.0] - Minimum fraction softmasked bases. --max-softmask [1.0] - Maximum fraction softmaskes bases. --min-entropy [0.0] - Minimum Shannon Entropy. --max-entropy [100.0] - Maximum Shannon Entropy. -o/--outfile [stdout] - Output file.
extractExtract sub-sequence based on provided range.
fasta_rs extract --fasta <sequences.fasta> <optional_args>
Optional arguments:
-s/--start [0] - Start coordinate (BED offset). -e/--end [u64::MAX] - End coordinate (BED offset). -o/--outfile [stdout] - Output file.
Since the coordinates are BED-compatible, extracting the ith base would be equivalent to using -s i-1 and -e i
sampleSample sequences based on a number or proportion.
fasta_rs sample --fasta <sequences.fasta> <optional_args>
Optional arguments:
-b/--by [1.0] - Num/fraction seqs to keep. -o/--outfile [stdout] - Output file.
sortSort sequences by a given metric.
fasta_rs sort --fasta <sequences.fasta> <optional_args>
Optional arguments:
--by [length] - {length, id, gc, entropy, softmask, ambiguous}.
-r/--reverse [false] - Sort in descending order.
-o/--outfile [stdout] - Output file.
shuffleRandomly shuffle sequences.
fasta_rs shuffle --fasta <sequences.fasta> <optional_args>
Optional arguments:
-o/--outfile [stdout] - Output file.
headView the first n sequences.
fasta_rs head --fasta <sequences.fasta> <optional_args>
Optional arguments:
-n/--num_seqs [5] - Number of sequences to output.
grepSearch and filter sequence ids by regular expressions.
fasta_rs grep --fasta <sequences.fasta> --pattern <regex_string> <optional_args>
Optional arguments:
-o/--outfile [stdout] - Output file.
ampliconIn silico PCR by exact or fuzzy primer matching.
fasta_rs amplicon --fasta <sequences.fasta> --primers <primers.tsv> --search-type {exact, fuzzy} <optional_args>
Optional arguments:
-o/--outfile [stdout] - Output file.
The primer.tsv TAB separated file needs to specifies the following for each primer pair:
compressHomopolymer compress sequences.
fasta_rs compress --fasta <sequences.fasta> <optional_args>
Optional arguments:
-m/--max-hp-len [5] - Compress down to homopolymers of max provided length. E.g., ATCGGGGGGG with -m 3 outputs ATCGGG. -o/--outfile [stdout] - Output file.
reverseReverse complement sequences.
fasta_rs reverse --fasta <sequences.fasta> <optional_args>
Optional arguments:
-o/--outfile [stdout] - Output file.