| Crates.io | sview-fmindex |
| lib.rs | sview-fmindex |
| version | 0.1.1 |
| created_at | 2025-07-14 06:34:49.67833+00 |
| updated_at | 2025-08-15 09:52:29.686022+00 |
| description | FM-index library with slice view architecture for efficient text indexing and pattern matching |
| homepage | https://github.com/baku4/sview-fmindex |
| repository | https://github.com/baku4/sview-fmindex |
| max_upload_size | |
| id | 1751220 |
| size | 192,571 |
sview-fmindex is a Rust library for building FM-index into pre-allocated blob and using it with minimal copying through slice view.
-fmindex: FM-index is a compressed text index that provides two main operations:
sview-: In this library, data is stored in one contiguous blob, and the FM-index structure is created as a slice view into this blob. builder
┌────────┐
│ header │
└────────┘
│
|-(build with text)
blob ▼
┌────────┬──────────────────────┐
│ header │ body (LARGE) │
└────────┴──────────────────────┘
│
│-(load index)
fm-index ▼
┌────────┬────────┐
│ header │ view │
└────────┴────────┘
use sview_fmindex::{FmIndexBuilder, FmIndex};
use sview_fmindex::blocks::Block2; // Block2 can index 4 symbols
use sview_fmindex::text_encoders::EncodingTable;
// (1) Define symbols to use
let symbols: &[&[u8]] = &[b"Aa", b"Cc", b"Gg", b"Tt"];
let encoding_table = EncodingTable::from_symbols(symbols);
let symbol_count = encoding_table.symbol_count(); // 4
// (2) Build index
let text = b"CTCCGTACACCTGTTTCGTATCGGAXXYYZZ".to_vec();
let builder = FmIndexBuilder::<u32, Block2<u64>, EncodingTable>::new(
text.len(),
symbol_count,
encoding_table,
).unwrap();
// You have to prepare a blob to build the index.
let blob_size = builder.blob_size();
let mut blob = vec![0; blob_size];
// Build the fm-index to the blob.
builder.build(text, &mut blob).unwrap();
// Load the fm-index from the blob.
let fm_index = FmIndex::<u32, Block2<u64>, EncodingTable>::load(&blob[..]).unwrap();
// (3) Match with pattern
let pattern = b"TA";
// - count
let count = fm_index.count(pattern);
assert_eq!(count, 2);
// - locate
let mut locations = fm_index.locate(pattern);
locations.sort(); // The locations may not be in order.
assert_eq!(locations, vec![5,18]);
// When using EncodingTable, the last symbol is treated as wild card.
// In the text, X, Y, and Z can match any other unindexed character
let mut locations = fm_index.locate(b"UNDEF");
locations.sort();
// Using the b"XXXXX", b"YYYYY", or b"!@#$%" gives the same result.
assert_eq!(locations, vec![25,26]);
| Load strategy | Avg RSS | Peak RSS | Blob load (ms) | Locate per pattern (ns) |
|---|---|---|---|---|
| Full in‑memory | 2.82 GiB | 4.50 GiB | 2,388 | 1,369 ns |
mmap (no Advice) |
0.48 GiB | 0.52 GiB | 0.04 | 1,365 ns |
u32, Block: Block2<u64>, UncompressedBase repo: This repository was forked from baku4/lt-fm-index (v0.7.0, commit 1327896).
FM-index implementation:
K-mer Lookup table implementation:
Burrows-Wheeler Transform: