fasta-cleaner

Crates.iofasta-cleaner
lib.rsfasta-cleaner
version1.0.1
created_at2024-12-12 12:37:37.724116+00
updated_at2024-12-12 13:37:47.871047+00
descriptionTransform fasta files by upper-casing all sequence characters and removing non-ACGT sequence characters.
homepage
repository
max_upload_size
id1481228
size25,983
Sebastian Schmidt (sebschmi)

documentation

README

Fasta Cleaner

Cleans fasta files. All sequences of newlines and carriage returns are replaced by single newlines. The Record headers are left unchanged, and the sequences are transformed into upper case, and all characters that are not A, C, G or T are removed.

While characters are removed, the line width of the input file is left intact. It is guessed from the width of the first input sequence line, and all subsequent sequence strings are adjusted accordingly. The adjustment happens via moving line breaks, and not via removing valid sequence characters.

Example

Input:

\r>WGCaC\n\nAACCcxXAA\naacc\n.ef34\nCGG\ntgtcgcgtagcgtgatcgtgtagtcgtag\r.\r>f\nTTT

Output

>WGCaC\nAACCCAAAA\nCCCGGTGTC\nGCGTAGCGT\nGATCGTGTA\nGTCGTAG\n>f\nTTT\n
```

## Known Issues

If the first sequence line is shorter than the line width of the input fasta file, then the sequence lines in the output fasta file will be adjusted accordingly.
Commit count: 0

cargo fmt