| Crates.io | primerpincer |
| lib.rs | primerpincer |
| version | 0.8.0 |
| created_at | 2025-11-12 20:25:46.474801+00 |
| updated_at | 2025-11-21 16:25:03.622867+00 |
| description | A CLI primer trimming tool for long-read sequencing data |
| homepage | https://github.com/mauricebarrett/primerpincer |
| repository | https://github.com/mauricebarrett/primerpincer |
| max_upload_size | |
| id | 1929900 |
| size | 115,658 |
First install cargo!
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Now you can install primerpincer. The most straightforward way is
cargo install primerpincer
However to enable SIMD optimizations in Sassy the following methods can be used.
RUSTFLAGS="-C target-cpu=native" cargo install primerpincer
PrimerPincer is a Rust-based command-line tool designed to efficiently detect and remove pairs (forward and reverse) of primers from single-end amplicon reads in FASTQ format, with a particular focus on long-read sequencing data generated by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT).
In amplicon-based microbiome studies, such as those targeting 16S, ITS, 18S, or COI regions, primer removal is a crucial preprocessing step. The phylogenetically conserved regions where primers bind are typically removed because:
The rise of third-generation sequencing platforms from PacBio and ONT has enabled the use of much longer marker gene regions than was previously feasible—such as the full-length 16S (V1–V9), 16S–ITS–23S operon, or 18S–ITS–28S operon. Additionally, the throughput and read counts produced per run continue to increase, driving a steady growth in the total volume of sequencing data generated.
PrimerPincer is designed to scale with these demands, providing rapid and accurate primer identification and removal for long-read datasets—with performance and scalability built for the future of sequencing.
Choose the best algorithm for your use case:
Automatically handles common compression formats via niffler:
--compression flag (gzip, bzip2, xz, zstd, or uncompressed; defaults to gzip)Full support for IUPAC nucleotide ambiguity codes in primer sequences:
The tool checks forward orientation first, followed by reverse orientation:
An optional size filtering can be applied:
Reads that fall below a determined average Phred quality score threshold are filtered out:
PrimerPincer - a CLI tool for the rapid identification and removal of paired primers from long read amplicons
Usage: primerpincer [OPTIONS] --input <FILE> --output <FILE> --forward <SEQUENCE> --reverse <SEQUENCE>
Options:
-i, --input <FILE>
Input FASTQ file
-o, --output <FILE>
Output FASTQ file
-f, --forward <SEQUENCE>
Forward primer sequence (5' to 3' orientation)
-r, --reverse <SEQUENCE>
Reverse primer sequence (5' to 3' orientation)
-a, --algorithm <ALGORITHM>
Algorithm to use for primer matching
Possible values:
- sassy: Pattern matching algorithm as described in Beeloo and Koerkamp (2025)
- myers: Rust Bio's Myers bit-parallel algorithm, very similar to Edlib's algorithm as described in Šošić and Šikić (2017)
- hamming: Hamming distance algorithm as described in Waterman and Eggert (1987). Can tolerate mismatches but not indels
- bndm: Rust Bio's BNDM exact pattern matching algorithm as described in Baeza-Yates and Gonnet (1992). Exact matching only. No mismatch or indels tolerated
[default: sassy]
-e, --error-rate <FLOAT>
Maximum error rate in primer matching (e.g., 0.15 for 15% errors)
[default: 0.15]
-w, --window-size <INT>
Window size to search for primer at start and end of sequence
[default: 100]
-O, --overlap <MINLENGTH>
Minimum overlap length. Require MINLENGTH bases of the primer to match (default 6)
[default: 6]
-t, --threads <INT>
Number of threads to use
[default: 4]
-c, --compression <COMPRESSION>
Compression format for the output FASTQ (defaults to gzip)
Possible values:
- none: No compression; write plain text FASTQ
- gzip: Standard gzip compression
- bzip2: bzip2 compression
- xz: LZMA/XZ compression
- zstd: Zstandard compression
[default: gzip]
-m, --min-length <INT>
Minimum read length after trimming (inclusive)
-M, --max-length <INT>
Maximum read length after trimming (inclusive)
-q, --min-average-quality <FLOAT>
Minimum Average Quality Score
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
primerpincer \
-i ./example_data/raw/ATCC-MSA1003-toy-example.fastq.gz \
-o ~/primerpincer_proccesed/ATCC-MSA1003-toy-example.fastq.gz \
-f "AGRGTTYGATYMTGGCTCAG" \
-r "RGYTACCTTGTTACGACTT" \
-t 12 \
-a sassy \
-O 6 \
-l 500
Contributions to PrimerPincer are welcome! Here are some ways you can contribute:
git checkout -b feature/amazing-feature)cargo fmt and ensure it passes cargo clippy --all-targets -- -D warningsfeat: add new algorithm, fix: resolve compilation error)git push origin feature/amazing-feature)CI Checks: All pull requests will be automatically checked by our CI workflow (.github/workflows/ci.yaml):
cargo fmt --all -- --checkcargo check --all-targetscargo clippy --all-targets -- -D warningsAll CI checks must pass before your PR can be merged.
If you use PrimerPincer in your research, please cite:
Beeloo, R. & Groot Koerkamp, R. Sassy: Searching Short DNA Strings in the 2020s. 2025.07.22.666207 Preprint at https://doi.org/10.1101/2025.07.22.666207 (2025).
This project is licensed under the MIT License - see the LICENSE file for details.