| Crates.io | simdna |
| lib.rs | simdna |
| version | 1.0.2 |
| created_at | 2025-12-17 05:49:31.456988+00 |
| updated_at | 2025-12-19 02:24:51.544736+00 |
| description | High-performance SIMD-accelerated DNA sequence encoding supporting all IUPAC nucleotide codes |
| homepage | |
| repository | https://github.com/Rbfinch/simdna |
| max_upload_size | |
| id | 1989339 |
| size | 1,950,143 |
High-performance DNA/RNA sequence encoding and decoding using SIMD instructions with automatic fallback to scalar implementations.
Add simdna to your Cargo.toml:
[dependencies]
simdna = "1.0.2"
Or install via cargo:
cargo add simdna
simdna supports the complete IUPAC nucleotide alphabet with a bit-rotation-compatible encoding scheme. This encoding enables efficient complement calculation via a simple 2-bit rotation operation.
| Code | Meaning | Value | Complement |
|---|---|---|---|
| A | Adenine | 0x1 | T (0x4) |
| C | Cytosine | 0x2 | G (0x8) |
| G | Guanine | 0x8 | C (0x2) |
| T | Thymine | 0x4 | A (0x1) |
| U | Uracil (RNA → T) | 0x4 | A (0x1) |
| Code | Meaning | Value | Complement |
|---|---|---|---|
| R | A or G (purine) | 0x9 | Y (0x6) |
| Y | C or T (pyrimidine) | 0x6 | R (0x9) |
| S | G or C (strong) | 0xA | S (0xA) |
| W | A or T (weak) | 0x5 | W (0x5) |
| K | G or T (keto) | 0xC | M (0x3) |
| M | A or C (amino) | 0x3 | K (0xC) |
| Code | Meaning | Value | Complement |
|---|---|---|---|
| B | C, G, or T (not A) | 0xE | V (0xB) |
| D | A, G, or T (not C) | 0xD | H (0x7) |
| H | A, C, or T (not G) | 0x7 | D (0xD) |
| V | A, C, or G (not T) | 0xB | B (0xE) |
| Code | Meaning | Value | Complement |
|---|---|---|---|
| N | Any base | 0xF | N (0xF) |
| - | Gap / deletion | 0x0 | - (0x0) |
| . | Gap (alternative) | 0x0 | - (0x0) |
The encoding is designed so that the complement of any nucleotide can be computed via a 2-bit rotation:
complement = ((bits << 2) | (bits >> 2)) & 0xF
This enables SIMD-accelerated reverse complement operations that are ~2x faster than lookup table approaches.
use simdna::dna_simd_encoder::{encode_dna_prefer_simd, decode_dna_prefer_simd};
// Encode a DNA sequence with IUPAC codes
let sequence = b"ACGTNRYSWKMBDHV-";
let encoded = encode_dna_prefer_simd(sequence);
// The encoded data is 2x smaller (2 nucleotides per byte)
assert_eq!(encoded.len(), sequence.len() / 2);
// Decode back to the original sequence
let decoded = decode_dna_prefer_simd(&encoded, sequence.len());
assert_eq!(decoded, sequence);
// RNA sequences work seamlessly (U maps to T)
let rna = b"ACGU";
let encoded_rna = encode_dna_prefer_simd(rna);
let decoded_rna = decode_dna_prefer_simd(&encoded_rna, rna.len());
assert_eq!(decoded_rna, b"ACGT"); // U decodes as T
simdna provides efficient SIMD-accelerated reverse complement operations for DNA/RNA sequences with consistent performance for both even and odd-length sequences:
use simdna::dna_simd_encoder::{reverse_complement, reverse_complement_encoded, encode_dna_prefer_simd};
// High-level API: ASCII in, ASCII out
let sequence = b"ACGT";
let rc = reverse_complement(sequence);
assert_eq!(rc, b"ACGT"); // ACGT is its own reverse complement
// Biological example
let forward = b"ATGCAACG";
let rc = reverse_complement(forward);
assert_eq!(rc, b"CGTTGCAT");
// Low-level API: operates directly on encoded data for maximum performance (~20 GiB/s)
let encoded = encode_dna_prefer_simd(b"ACGT");
let rc_encoded = reverse_complement_encoded(&encoded, 4);
// rc_encoded is the encoded form of "ACGT"
Reverse complement correctly handles all IUPAC ambiguity codes:
use simdna::dna_simd_encoder::reverse_complement;
// R (purine: A|G) complements to Y (pyrimidine: C|T)
assert_eq!(reverse_complement(b"R"), b"Y");
// Self-complementary codes: S (G|C), W (A|T), N (any)
assert_eq!(reverse_complement(b"SWN"), b"NWS");
"ACGT" and "acgt" encode identicallysimdna focuses exclusively on high-performance encoding/decoding, making it composable with any FASTA/FASTQ parser or custom format. This keeps the library lightweight and lets you choose the tools that fit your workflow.
seq_io is a fast FASTA/FASTQ parser. simdna works directly with its borrowed sequence data:
use seq_io::fasta::Reader;
use simdna::dna_simd_encoder::encode_dna_prefer_simd;
let mut reader = Reader::from_path("genome.fasta")?;
while let Some(record) = reader.next() {
let record = record?;
// seq_io provides &[u8] directly - no allocation needed
let encoded = encode_dna_prefer_simd(record.seq());
// ... use encoded data
}
noodles is a comprehensive bioinformatics I/O library:
use noodles::fasta;
use simdna::dna_simd_encoder::encode_dna_prefer_simd;
let mut reader = fasta::io::reader::Builder::default().build_from_path("genome.fasta")?;
for result in reader.records() {
let record = result?;
let encoded = encode_dna_prefer_simd(record.sequence().as_ref());
// ... use encoded data
}
rust-bio provides algorithms and data structures for bioinformatics:
use bio::io::fasta;
use simdna::dna_simd_encoder::encode_dna_prefer_simd;
let reader = fasta::Reader::from_file("genome.fasta")?;
for result in reader.records() {
let record = result?;
let encoded = encode_dna_prefer_simd(record.seq());
// ... use encoded data
}
simdna accepts &[u8] slices, enabling zero-copy integration with parsers. Avoid unnecessary allocations:
// ✓ Good: Work directly with borrowed data
let encoded = encode_dna_prefer_simd(record.seq());
// ✗ Avoid: Unnecessary allocation
let owned: Vec<u8> = record.seq().to_vec();
let encoded = encode_dna_prefer_simd(&owned);
Most FASTA/FASTQ parsers provide sequence data as &[u8] or types that implement AsRef<[u8]>, which work directly with simdna's API.
| Platform | SIMD | Fallback |
|---|---|---|
| x86_64 | SSSE3 | Scalar |
| ARM64 | NEON | Scalar |
| Other | - | Scalar |
simdna employs multiple optimization strategies:

Benchmarks obtained on a Mac Studio with 32GB RAM and Apple M1 Max chip running macOS Tahoe 26.1 using the Criterion.rs statistics-driven micro-benchmarking library.
simdna employs a comprehensive testing strategy to ensure correctness and robustness:
Run the standard test suite with:
cargo test
The unit tests cover:
simdna uses cargo-fuzz for property-based fuzz testing to discover edge cases and potential bugs. The following fuzz targets are available:
| Target | Description |
|---|---|
roundtrip |
Verifies encode→decode produces consistent output |
valid_iupac |
Tests encoding of valid IUPAC sequences |
decode_robust |
Tests decoder resilience to arbitrary byte sequences |
boundaries |
Tests sequence length boundary conditions |
simd_scalar_equivalence |
Verifies SIMD and scalar implementations produce identical results |
bit_rotation |
Verifies bit rotation complement properties (involution, consistency) |
reverse_complement |
Tests reverse complement correctness (double-rc = original) |
Run fuzz tests with:
cargo +nightly fuzz run <target> -- -max_total_time=60
Contributions are welcome! Please see CONTRIBUTING.md for guidelines on bug reports and feature requests.
See CHANGELOG.md for a history of changes to this project.
If you use simdna in your research, please cite it using the metadata in CITATION.cff. GitHub can also generate citation information directly from the repository page.
This project is licensed under the MIT License - see LICENSE for details.