fastset

Crates.iofastset
lib.rsfastset
version0.4.1
sourcesrc
created_at2024-03-07 03:08:22.9379
updated_at2024-04-05 10:54:30.732849
descriptionFast set implementation for dense, bounded integer collections, optimized for quick updates and access.
homepagehttps://github.com/b-vitamins/fastset
repositoryhttps://github.com/b-vitamins/fastset
max_upload_size
id1165497
size147,794
Ayan (b-vitamins)

documentation

https://docs.rs/fastset

README

fastset

Crates.io docs.rs License GitHub Workflow Status

Fast set implementation for dense, bounded integer collections, offering quick updates and random access.

Features

  • Tailored for unsigned integer (usize) elements, ideal for index-based applications
  • Fast insertion, removal, and membership check
  • random method for uniform random sampling
  • Paging mechanism to somewhat mitigate the large memory footprint1

Note that while paging improves the existing memory footprint, fastset::Set is still not a good solution for memory constrained applications or for applications with storage need for sparse elements spread over an extended range. For integers twice as sparse as the page size, the fastset::Set with paging has peak heap allocation ~ 8x that of std::collections::HashSet.

Benchmarks

Operation fastset::Set hashbrown::HashSet std::collections::HashSet
insert 1.1632 ns 4.7105 ns 14.136 ns
remove 1.1647 ns 3.0459 ns 10.625 ns
contains 932.81 ps 985.38 ps 13.827 ns
random 651.26 ps N/A N/A
  • CPU: AMD Ryzen™ 5 5600G with Radeon™ Graphics x 12
  • RAM: 58.8 GiB
  • OS: Guix System, 64-bit

Usage

use fastset::{set, Set};
use nanorand::WyRand;

   let mut set = set![5, 10, 15, 20, 25, 30]; // Initialize set with elements
   assert!(set.contains(&5)); // Check for element presence

   set.insert(35); // Insert a new element
   assert!(set.contains(&35));

   set.remove(&5); // Remove an element
   assert!(!set.contains(&5));

   if let Some(taken) = set.take(&10) { // Remove and return an element
       assert_eq!(taken, 10);
   }

   let mut rng = WyRand::new();
   if let Some(element) = set.random(&mut rng) { // Get a random element
       set.remove(&element); // Remove the randomly selected element
       assert!(!set.contains(&element));
   }

   println!("Set: {:?}, Length: {}", set, set.len()); // Display the set and its length

Delphic Sets

fastset::Set, as implemented here, meets the conditions for being a Delphic set [1], [2]:

Let Ω be a discrete universe. A set (S ⊆ Ω) is considered a member of a Delphic family if it supports the following operations within O(log |Ω|) time:

  • Membership: Verify if any element (x ∈ Ω) exists within (S).
  • Cardinality: Determine the size of (S), i.e., (|S|).
  • Sampling: Draw a uniform random sample from (S).

A unit test in src/set.rs verifies the uniform sampling property with a basic Chi-squared test.

use fastset::Set;
use nanorand::WyRand;
use statrs::distribution::{ChiSquared, ContinuousCDF};

fn sampling_is_uniformly_at_random() {
    const SAMPLES: usize = 1_000_000;
    const EDGE_OF_THE_UNIVERSE: usize = 10000;

    let elements = (1..=EDGE_OF_THE_UNIVERSE).collect::<Vec<_>>();
    let set = Set::from(elements.clone());
    let mut rng = WyRand::new_seed(42u64);
    let mut counts = vec![0f64; elements.len()];

(0..SAMPLES).for_each(|_| {
   if let Some(value) = set.random(&mut rng) {
       counts[value - 1] += 1.0;
   }
});

    let e = SAMPLES as f64 / elements.len() as f64;
    let statistic: f64 = counts.iter().map(|&o| { (o - e) * (o - e) / e }).sum();

    let dof = elements.len() - 1;
    let chi = ChiSquared::new(dof as f64).unwrap();
    let acceptable = chi.inverse_cdf(0.99);

    // Null hypothesis: Elements are sampled uniformly at random
    assert!(
        statistic < acceptable,
        "Chi-square statistic {} is greater than what's acceptable ({})",
        statistic,
        acceptable,
    );
}

References

[1]: Chakraborty, Sourav, N. V. Vinodchandran, and Kuldeep S. Meel. "Distinct Elements in Streams: An Algorithm for the (Text) Book." arXiv preprint arXiv:2301.10191 (2023).

[2]: Meel, Kuldeep S., Sourav Chakraborty, and N. V. Vinodchandran. "Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size." Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2022.

Footnotes

  1. A paging mechanism is introduced in 0.4.0 that reduces the memory-footprint of fastset::Set. With the paging feature, fastset::Set achieves ~ 50% reduction in peak heap memory allocations with no additional performance overhead.

Commit count: 48

cargo fmt