fasta_rs

Crates.iofasta_rs
lib.rsfasta_rs
version0.0.1
created_at2025-10-10 08:02:47.795716+00
updated_at2025-10-10 08:02:47.795716+00
descriptionMulti purpose fasta toolkit
homepage
repositoryhttps://github.com/OscarAspelin95/fasta_rs
max_upload_size
id1876661
size110,723
Oscar Aspelin (OscarAspelin95)

documentation

README

fasta_rs

Fasta toolkit, aiming to an alternative to seqkit.

Requirements

  • Linux OS (Ubuntu 24.04.2)
  • Rust >= 1.88.0

Installation

Clone the repository or download the source code. Enter the fasta_rs directory and run:
cargo build --release

The generated binary is available in target/release/fasta_rs.

Usage

Run with:
fasta_rs <subcommand> <args>

Example

The following command will randomly sample 50% of the sequences, filter by gc content and finally convert to a .tsv file.
fasta_rs sample -b 0.5 < file.fasta | fasta_rs filter --min-gc 0.5 | fasta_rs fa2tab > out.tsv

Subcommands

fasta_rs split

Split into one file per sequence.

fasta_rs split --fasta <sequences.fasta> <optional_args>

Optional arguments:

-o/--outdir [fasta_split] - Output directory.

fasta_rs stats

Calculate basic stats.

fasta_rs stats --fasta <sequences.fasta> <optional_args>

Optional arguments:

-o/--outfile [stdout] - Output file.

fasta_rs fa2tab

Generate a .tsv file with basic information about each sequence.

fasta_rs fa2tab --fasta <sequences.fasta> <optional_args>

Optional arguments:

-o/--outfile [stdout] - Output file.

fasta_rs homopolymers

Find homopolymers in sequences.

fasta_rs homopolymers --fasta <sequences.fasta> <optional_args>

Optional arguments:

-m/--min-hp-len [5] - Min homopolymer length to consider.

-s/--strict [false] - Only consider homopolymers for {A, C, G, T, a, c, g, t}.

-o/--outfile [stdout] - Output file.

fasta_rs filter

Filter sequences based on certain criteria.

fasta_rs filter --fasta <sequences.fasta> <optional_args>

Optional arguments:

--min-len [0] - Minimum sequence length.

--max-len [u64::MAX] - Maximum sequence length.

--min-gc [0.0] - Minimum GC content.

--max-gc [1.0] - Maximum GC content.

--min-ambig [0.0] - Minimum fraction ambiguous bases.

--max-ambig [1.0] - Maximum fraction ambiguous bases.

--min-softmask [0.0] - Minimum fraction softmasked bases.

--max-softmask [1.0] - Maximum fraction softmaskes bases.

--min-entropy [0.0] - Minimum Shannon Entropy.

--max-entropy [100.0] - Maximum Shannon Entropy.

-o/--outfile [stdout] - Output file.

fasta_rs extract

Extract sub-sequence based on provided range.

fasta_rs extract --fasta <sequences.fasta> <optional_args>

Optional arguments:

-s/--start [0] - Start coordinate (BED offset).

-e/--end [u64::MAX] - End coordinate (BED offset).

-o/--outfile [stdout] - Output file.

Since the coordinates are BED-compatible, extracting the ith base would be equivalent to using -s i-1 and -e i

fasta_rs sample

Sample sequences based on a number or proportion.

fasta_rs sample --fasta <sequences.fasta> <optional_args>

Optional arguments:

-b/--by [1.0] - Num/fraction seqs to keep.

-o/--outfile [stdout] - Output file.

fasta_rs sort

Sort sequences by a given metric.

fasta_rs sort --fasta <sequences.fasta> <optional_args>

Optional arguments:

--by [length] - {length, id, gc, entropy, softmask, ambiguous}.

-r/--reverse [false] - Sort in descending order.

-o/--outfile [stdout] - Output file.

fasta_rs shuffle

Randomly shuffle sequences.

fasta_rs shuffle --fasta <sequences.fasta> <optional_args>

Optional arguments:

-o/--outfile [stdout] - Output file.

fasta_rs head

View the first n sequences.

fasta_rs head --fasta <sequences.fasta> <optional_args>

Optional arguments:

-n/--num_seqs [5] - Number of sequences to output.

fasta_rs grep

Search and filter sequence ids by regular expressions.

fasta_rs grep --fasta <sequences.fasta> --pattern <regex_string> <optional_args>

Optional arguments:

-o/--outfile [stdout] - Output file.

fasta_rs amplicon

In silico PCR by exact or fuzzy primer matching.

fasta_rs amplicon --fasta <sequences.fasta> --primers <primers.tsv> --search-type {exact, fuzzy} <optional_args>

Optional arguments:

-o/--outfile [stdout] - Output file.

primer file

The primer.tsv TAB separated file needs to specifies the following for each primer pair:

  • Primer name.
  • Forward primer sequence (5' -> 3').
  • Reverse primer sequence (5' -> 3').
  • Expected minimum length of insert size.
  • Expected maximum length of insert size.
  • Num allowed mismatches (only for fuzzy search).

fasta_rs compress

Homopolymer compress sequences.

fasta_rs compress --fasta <sequences.fasta> <optional_args>

Optional arguments:

-m/--max-hp-len [5] - Compress down to homopolymers of max provided length. E.g., ATCGGGGGGG with -m 3 outputs ATCGGG.

-o/--outfile [stdout] - Output file.

fasta_rs reverse

Reverse complement sequences.

fasta_rs reverse --fasta <sequences.fasta> <optional_args>

Optional arguments:

-o/--outfile [stdout] - Output file.
Commit count: 0

cargo fmt