| Crates.io | merkurio |
| lib.rs | merkurio |
| version | 1.0.2 |
| created_at | 2025-07-25 09:43:51.97363+00 |
| updated_at | 2025-11-28 16:05:04.725275+00 |
| description | Quick k-mer-based FASTA/FASTQ sequence record extraction, and SAM/BAM record filtering plus file annotation with k-mer tags. |
| homepage | |
| repository | https://github.com/lschoenm/MerKurio |
| max_upload_size | |
| id | 1767350 |
| size | 19,627,001 |
MerKurio is a command line tool for extracting records from FASTA/FASTQ files based on k-mers, and for annotating and filtering aligned sequences in BAM/SAM format with k-mer tags.
It was developed to simplify downstream analysis of selected k-mers by tracing them back to their sequences in the original data.
MerKurio is designed to be user-friendly, flexible, fast, and memory efficient.
- Documentation
- Features
- Example Workflow
- Usage
- Installation
The full documentation can be found ➡️ here.
A quick overview of the features and usage is provided below.
MerKurio provides two complementary subcommands:
km) with comma-separated matching k‑mers (follows the SAM format specification).Both commands share additional features:
.gz, .bz2, .xz).Two examples are prepared in this repository intended for Unix-like systems:
For a quick and minimal example, follow the steps described here.
A more detailed practical example/tutorial can be found in the repo or the docs.
Although the tool is designed to be flexible and can be used in a variety of ways, it was designed with the following workflow in mind:
If sequencing reads were extracted:
Run merkurio or one of its subcommands with the --help flag to see the available options and subcommands.
extract SubcommandRunning merkurio extract will extract records from a sequence file (FASTA or FASTQ, format is inferred automatically) based on a list of query sequences (k-mers). The query sequences can be provided in a file or as a list of strings on the command line. Reverse complements can be included in the search. The extracted records are written to stdout or to a new file in the same format as the input file. The tool tries to select the most efficient algorithm for the given query sequences.
Detailed match statistics are written to stdout or to a file if specified, showing which records got hit by sequences along with a zero-based position. Matching statistics can also be saved in JSON format for easier parsing by other programs.
An example usage of the extract subcommand to extract records from a FASTA file based on a list of k-mers and their reverse complements is shown below; logging information is written to stdout:
merkurio extract -i input.fasta -o output.fasta -f query_kmers.txt -r -l
Another example where paired-end reads are extracted if they contain the sequence ACGT or TGCA; the extracted records are written to stdout and logging information is written to a file called log.txt (the -i and -1 flags can be used interchangeably):
merkurio extract -1 input_R1.fastq -2 input_R2.fastq -o output -s ACGT TGCA -l log.txt
tag SubcommandRunning merkurio tag will tag aligned sequences in a BAM/SAM file with k-mers. If a record contains one or more of the k-mers, it is annotated with a tag ("km" by default; must be exactly two characters long) and the respective k-mers. Multithreading is supported for BAM files. Optionally, keep only records which are matching at least one k-mer.
Detailed match statistics are written to stdout or to a file if specified, showing which records got hit by sequences along with a zero-based position. Matching statistics can also be saved in JSON format for easier parsing. Matching records output can be suppressed if one is only interested in the matching statistics.
An example usage of the tag subcommand to tag a BAM file with the k-mers in the file query_kmers.fasta, with SAM output:
merkurio tag -i input.bam -o output.sam -f query_kmers.fasta
Another example where the k-mers are provided on the command line, and the search is also performed for their reverse complements. The tag is set to "MK". BAM file processing is done with 4 threads:
merkurio tag -i input.bam -o output.bam -s ACGT TGCA -r -p 4 -t MK
You can install MerKurio in several ways, depending on your system and whether you have Rust installed.
1. Precompiled Binaries (No Rust Needed)
2. Install via Cargo (Requires Rust)
3. Build Manually Without Installing (Requires Rust)
After installation, verify if it works by running:
merkurio --help
Or, if you didn't add it to your PATH:
./path/to/merkurio --help
Download a binary for Linux, Windows, or macOS from the releases page, then extract the archive:
tar -xzf path/to/release.tar.gz
On Linux/macOS, make it executable if needed:
chmod u+x path/to/merkurio
The merkurio-x86_64-unknown-linux-musl binary is compatible with a wider range of systems but can have worse performance.
If you have Rust installed (edition 2024), the easiest way is:
cargo install merkurio
This pulls the latest version from crates.io.
To install a tagged release from GitHub:
cargo install --git https://github.com/lschoenm/MerKurio --tag vX.X.X
git clone https://github.com/lschoenm/MerKurio
cd MerKurio
cargo build --release
The binary will be in target/release/.
The code in this repository is licensed under the MIT license.
Test data and example files in the tests/ and example-minimal/ directories are licensed under the CC0 1.0 Universal license.