Crates.io | seqdupes |
lib.rs | seqdupes |
version | 0.2.0 |
source | src |
created_at | 2022-11-10 19:01:50.232808 |
updated_at | 2024-06-24 18:44:22.01403 |
description | Compress sequence duplicates |
homepage | https://stevenweaver.org |
repository | https://github.com/stevenweaver/seqdupes |
max_upload_size | |
id | 712298 |
size | 67,863 |
Removes duplicates from FASTA files. Supports filtering based on sequence content or header information.
Download the source code and run:
cargo install
Run seqdupes
to process FASTA files. You can specify whether to filter by sequence or by header.
seqdupes -f path/to/sequence.fastq -j path/to/output.json > no_dupes.fas
If you prefer to filter duplicates based on headers rather than sequences, use the --by-header
flag.
seqdupes -f path/to/sequence.fastq -j path/to/output.json --by-header > no_dupes.fas
Parameter | Default | Description |
---|---|---|
-f, --fasta | - | The path to the FASTQ file to use. |
-j, --json | - | The output path for listing duplicates. |
-b, --by-header | - | Enables filtering based on headers (optional). |
The tool outputs a FASTA file with duplicates removed to stdout
and a JSON file containing details of the duplicates to the specified path.