| Crates.io | hashfasta |
| lib.rs | hashfasta |
| version | 1.0.0 |
| created_at | 2026-01-24 22:45:40.582443+00 |
| updated_at | 2026-01-24 22:45:40.582443+00 |
| description | Very quickly compute hashes for FASTA/FASTQ files considering **only** the sequence content. |
| homepage | |
| repository | https://github.com/Sam-Sims/hashfasta |
| max_upload_size | |
| id | 2067655 |
| size | 106,171 |
hash fasta (faster?)
Very quickly compute hashes for FASTA/FASTQ files considering only the sequence content.
Hashfasta produces a hash for an input file that is stable and dependent only on the sequence content. It ignores:
It can optionally consider only the canonical sequence (the lexicographically smaller of the forward and reverse complement) and/or normalise sequences before hashing. See Sequence handling options.
It supports FASTA and FASTQ files (optionally compressed with gz) as input, and can read via http/https, SSH and stdin.
Precompiled binaries for Linux, MacOS and Windows are attached to the latest release.
Requires cargo
cargo install hashfasta
To install please refer to the rust documentation: docs
git clone https://github.com/Sam-Sims/hashfasta
cd hashfasta
cargo build --release
export PATH=$PATH:$(pwd)/target/release
All executables will be in the directory hashfasta/target/release.
Very quickly compute hashes for FASTX files considering **only** the sequence content.
Usage: hashfasta <COMMAND>
Commands:
hash Hash every record in the input and output a final aggregate_hash representing the sequence content of the entire file.
unique Output only the records whose sequences are unique within the input.
duplicate Output only the records whose sequences are duplicates of earlier records.
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
Each subcommand takes a single positional argument <FASTX> which is the path/URL to the input FASTA/FASTQ file (or - for stdin).
By default, output is written to stdout in tab-separated format. Use --json to output in JSON format. See Output Formats.
hashHash every record in the input and output a final "aggregate_hash" representing the seqeunce content of the entire file. Suppress per-record output with -q/--quiet.
hashfasta hash [OPTIONS] <FASTX>
uniqueOutput only the records whose sequences are unique within the input. Useful for deduplicating records.
hashfasta unique [OPTIONS] <FASTX>
duplicateOutput only the records whose sequences are duplicates of earlier records.
hashfasta duplicate [OPTIONS] <FASTX>
All subcommands share the following options:
-c, --canonical: Use the canonical sequence. See Canonicalisation.-n, --normalise: Normalise sequences before hashing. See Normalisation.-s, --strict: Fail if non-ACGTUN- bases are encountered.-t, --threads: Number of threads [default: 1].-j, --json: Output records as JSON instead of TSV.-q, --quiet: Only print the aggregate hash (suppress record-level output).-n / --normalise)Normalisation ensures that sequences are treated consistently regardless of case or RNA/DNA differences. It performs the following:
U) to Thymine (T).ACGTU-) as N.-c / --canonical)Canonicalisation does the following:
This ensures that a sequence and its reverse complement will always produce the same hash.
# Generate hashes for a single fasta file
hashfasta hash sequences.fasta
# Hash a compressed file normalising sequences, and consider canonical sequences
hashfasta hash -cn reads.fq.gz
# Fail early if invalid bases are encountered
hashfasta hash --strict input.fasta
# Output as JSON, instead of TSV
hashfasta hash --json input.fasta
# Read from stdin
tar -xOf collection.tar.gz | hashfasta hash -
# Read from HTTP
hashfasta hash https://example.com/sequences.fasta
# Read from SSH
hashfasta hash ssh://user@host/path/to/sequences.fasta
Default (TSV):
id hash
seq1 537edb87f29c16e5
seq2 2cbf6181d9bdb039
aggregate_hash 02402baf8722f975
JSON (--json):
{
"records": [
{
"id": "seq1",
"hash": "537edb87f29c16e5"
},
{
"id": "seq2",
"hash": "2cbf6181d9bdb039"
}
],
"aggregate_hash": "02402baf8722f975"
}