pgm_index

Crates.iopgm_index
lib.rspgm_index
version0.3.2
created_at2025-08-12 21:37:29.033259+00
updated_at2025-08-15 17:21:05.672101+00
descriptionUltra-fast learned PGM-Index for efficient sorted key lookup with bounded error
homepagehttps://github.com/ARyaskov/pgm_index
repositoryhttps://github.com/ARyaskov/pgm_index
max_upload_size
id1792934
size57,538
Andrei RiaskΓ³v (ARyaskov)

documentation

https://docs.rs/pgm_index

README

πŸ“ pgm_index β€” Learned Index for Sorted Keys

Crates.io Downloads (recent)

PGM-Index is a space-efficient data structure for fast lookup in sorted sequences.
It approximates the distribution of keys with piecewise linear models, allowing searches in O(log Ξ΅) with a guaranteed error bound.


πŸ“„ Algorithm

Based on the work by Paolo Ferragina & Giorgio Vinciguerra:

The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds (2020)
πŸ”— Paper Β· 🌐 Official site


πŸ“¦ Installation

Add to your Cargo.toml:

[dependencies]
pgm_index = "*"

πŸ›  Usage

use pgm_index::PGMIndex;

fn main() {
    let data: Vec<u64> = (0..1_000_000).collect();
    let pgm = PGMIndex::new(&data, 32); // Ξ΅ = 32

    let key = 123456;
    if let Some(pos) = pgm.search(key) {
        println!("Found at position {}", pos);
    } else {
        println!("Not found");
    }
}

🏎 Benchmarks by Ρ

Dataset: 1,000,000 elements, 100,000 random queries CPU: Intel Core i7 12700, Windows 11, single-threaded

Ξ΅ Build Time Mem Usage Segments Single Lookup Batch Lookup Avg ns/query
16 2.19 ms 7.99 MB 3906 20.84 M/s 25.35 M/s 48.0 / 39.4
32 2.22 ms 7.87 MB 1953 24.09 M/s 24.98 M/s 41.5 / 40.0
64 2.05 ms 7.75 MB 976 21.96 M/s 26.06 M/s 45.5 / 38.4
128 2.07 ms 7.69 MB 488 19.64 M/s 25.56 M/s 50.9 / 39.1

Binary Search (baseline): 4.32 ms, 23.13 M/s, 43.2 ns/query.


πŸ“Š Comparison to Other Indexes (1M elements) *

Using Ξ΅ = 32 as a balanced configuration:

Structure Memory Usage Build Time Lookup Speed (single) Batch Lookup Speed
PGM-Index (Ξ΅=32) 7.87 MB 2.22 ms 24.09 M/s 24.98 M/s
Binary Search ~8.0 MB β€” (no build) 23.13 M/s (0.96Γ—) 23.13 M/s (0.93Γ—)
BTreeMap ~24 MB ~50 ms ~4.0 M/s (0.17Γ—) ~4.0 M/s (0.16Γ—)
HashMap ~64 MB ~15 ms ~40.0 M/s (1.66Γ—) ~40.0 M/s (1.60Γ—)
  • our benchmarks

πŸ“ˆ Relative Performance *

Metric vs Binary Search vs BTreeMap vs HashMap
Memory 1.02Γ— better 3.05Γ— better 8.13Γ— better
Build Time β€” 22.5Γ— faster 6.8Γ— faster
Single Lookup 1.04Γ— faster 6.0Γ— faster 0.6Γ— slower
Batch Lookup 1.08Γ— faster 6.2Γ— faster 0.62Γ— slower
  • our benchmarks

πŸ“Œ Potential Use Cases

  • Indexing large sorted numeric datasets
  • Time-series databases
  • Read-optimized storage engines
  • Scientific & bioinformatics data search
  • Columnar store secondary indexes

πŸ“œ License

MIT

Commit count: 0

cargo fmt