Crates.io | fastset |
lib.rs | fastset |
version | 0.4.1 |
source | src |
created_at | 2024-03-07 03:08:22.9379 |
updated_at | 2024-04-05 10:54:30.732849 |
description | Fast set implementation for dense, bounded integer collections, optimized for quick updates and access. |
homepage | https://github.com/b-vitamins/fastset |
repository | https://github.com/b-vitamins/fastset |
max_upload_size | |
id | 1165497 |
size | 147,794 |
Fast set implementation for dense, bounded integer collections, offering quick updates and random access.
usize
) elements, ideal for index-based applicationsrandom
method for uniform random samplingNote that while paging improves the existing memory footprint,
fastset::Set
is still not a good solution for memory constrained applications
or for applications with storage need for sparse elements spread over an extended range.
For integers twice as sparse as the page size, the fastset::Set
with paging
has peak heap allocation ~ 8x that of std::collections::HashSet
.
Operation | fastset::Set |
hashbrown::HashSet |
std::collections::HashSet |
---|---|---|---|
insert | 1.1632 ns | 4.7105 ns | 14.136 ns |
remove | 1.1647 ns | 3.0459 ns | 10.625 ns |
contains | 932.81 ps | 985.38 ps | 13.827 ns |
random | 651.26 ps | N/A | N/A |
use fastset::{set, Set};
use nanorand::WyRand;
let mut set = set![5, 10, 15, 20, 25, 30]; // Initialize set with elements
assert!(set.contains(&5)); // Check for element presence
set.insert(35); // Insert a new element
assert!(set.contains(&35));
set.remove(&5); // Remove an element
assert!(!set.contains(&5));
if let Some(taken) = set.take(&10) { // Remove and return an element
assert_eq!(taken, 10);
}
let mut rng = WyRand::new();
if let Some(element) = set.random(&mut rng) { // Get a random element
set.remove(&element); // Remove the randomly selected element
assert!(!set.contains(&element));
}
println!("Set: {:?}, Length: {}", set, set.len()); // Display the set and its length
fastset::Set
, as implemented here, meets the conditions for being a Delphic set [1], [2]:
Let Ω be a discrete universe. A set (S ⊆ Ω) is considered a member of a Delphic family if it supports the following operations within O(log |Ω|) time:
A unit test in src/set.rs
verifies the uniform sampling property with a basic Chi-squared test.
use fastset::Set;
use nanorand::WyRand;
use statrs::distribution::{ChiSquared, ContinuousCDF};
fn sampling_is_uniformly_at_random() {
const SAMPLES: usize = 1_000_000;
const EDGE_OF_THE_UNIVERSE: usize = 10000;
let elements = (1..=EDGE_OF_THE_UNIVERSE).collect::<Vec<_>>();
let set = Set::from(elements.clone());
let mut rng = WyRand::new_seed(42u64);
let mut counts = vec![0f64; elements.len()];
(0..SAMPLES).for_each(|_| {
if let Some(value) = set.random(&mut rng) {
counts[value - 1] += 1.0;
}
});
let e = SAMPLES as f64 / elements.len() as f64;
let statistic: f64 = counts.iter().map(|&o| { (o - e) * (o - e) / e }).sum();
let dof = elements.len() - 1;
let chi = ChiSquared::new(dof as f64).unwrap();
let acceptable = chi.inverse_cdf(0.99);
// Null hypothesis: Elements are sampled uniformly at random
assert!(
statistic < acceptable,
"Chi-square statistic {} is greater than what's acceptable ({})",
statistic,
acceptable,
);
}
[1]: Chakraborty, Sourav, N. V. Vinodchandran, and Kuldeep S. Meel. "Distinct Elements in Streams: An Algorithm for the (Text) Book." arXiv preprint arXiv:2301.10191 (2023).
[2]: Meel, Kuldeep S., Sourav Chakraborty, and N. V. Vinodchandran. "Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size." Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2022.
A paging mechanism is introduced in 0.4.0
that reduces the memory-footprint of fastset::Set
.
With the paging feature, fastset::Set
achieves ~ 50% reduction in peak heap memory allocations
with no additional performance overhead. ↩