| Crates.io | sbitmap |
| lib.rs | sbitmap |
| version | 0.1.1 |
| created_at | 2025-10-01 04:20:47.870786+00 |
| updated_at | 2025-10-07 03:25:21.497278+00 |
| description | Fast and scalable bitmap implementation based on Linux kernel's sbitmap |
| homepage | |
| repository | https://github.com/ublk-org/sbitmap |
| max_upload_size | |
| id | 1862067 |
| size | 103,138 |
A fast and scalable lock-free bitmap implementation based on the Linux kernel's sbitmap.
sbitmap provides a high-performance, cache-line optimized bitmap for concurrent bit allocation across multiple threads. It's designed for scenarios where many threads need to allocate and free bits from a shared pool efficiently.
This implementation is based on the Linux kernel's sbitmap (from lib/sbitmap.c), specifically designed for:
SbitmapWord is aligned to 64 bytesAdd to your Cargo.toml:
[dependencies]
sbitmap = "0.1"
use sbitmap::Sbitmap;
// Create a bitmap with 1024 bits (non-round-robin mode)
let sb = Sbitmap::new(1024, None, false);
// Each caller maintains its own allocation hint
let mut hint = 0;
// Allocate a bit
if let Some(bit) = sb.get(&mut hint) {
// Use the allocated bit
println!("Allocated bit: {}", bit);
// Free it when done
sb.put(bit, &mut hint);
}
use sbitmap::Sbitmap;
use std::sync::Arc;
use std::thread;
let sb = Arc::new(Sbitmap::new(1024, None, false));
let mut handles = vec![];
for _ in 0..8 {
let sb = Arc::clone(&sb);
handles.push(thread::spawn(move || {
// Each thread maintains its own hint in local context
let mut hint = 0;
// Each thread can safely allocate/free bits
if let Some(bit) = sb.get(&mut hint) {
// Do work...
sb.put(bit, &mut hint);
}
}));
}
for h in handles {
h.join().unwrap();
}
use sbitmap::Sbitmap;
// Create a bitmap with 1024 bits
let sb = Sbitmap::new(1024, None, false);
let mut hint = 0;
// Allocate 4 consecutive bits atomically
if let Some(start_bit) = sb.get_batch(4, &mut hint) {
// Use bits: start_bit, start_bit+1, start_bit+2, start_bit+3
println!("Allocated bits {}-{}", start_bit, start_bit + 3);
// Process consecutive resources...
for i in 0..4 {
println!("Using bit {}", start_bit + i);
}
// Free all 4 bits atomically when done
sb.put_batch(start_bit, 4, &mut hint);
}
Note: Batch operations require nr_bits <= bits_per_word(). All consecutive bits are guaranteed to be within the same word (no spanning across word boundaries).
Sbitmap::new(depth: usize, shift: Option<u32>, round_robin: bool) -> SelfCreate a new sbitmap with depth bits. The shift parameter controls how many bits per word (2^shift bits per word) and is critical for performance - it determines how bits are spread across multiple cache-line aligned words. When None, the shift is auto-calculated for optimal cache usage. The round_robin parameter enables strict round-robin allocation order (usually false for better performance).
Understanding the shift parameter:
get(&self, hint: &mut usize) -> Option<usize>Allocate a free bit. The hint parameter is a mutable reference to the caller's allocation hint, which helps reduce contention by spreading allocations across different parts of the bitmap. Returns Some(bit_number) on success or None if no free bits are available.
put(&self, bitnr: usize, hint: &mut usize)Free a previously allocated bit. The hint parameter is updated to improve cache locality for subsequent allocations.
get_batch(&self, nr_bits: usize, hint: &mut usize) -> Option<usize>Allocate nr_bits consecutive free bits from the bitmap atomically. This operation provides acquire barrier semantics on success. Only supports nr_bits <= bits_per_word() to ensure all bits are within the same word (no spanning across word boundaries).
Returns Some(start_bit) where start_bit is the first bit of the allocated consecutive range, or None if no consecutive nr_bits are available or nr_bits > bits_per_word().
Use cases:
put_batch(&self, bitnr: usize, nr_bits: usize, hint: &mut usize)Free nr_bits consecutive previously allocated bits starting from bitnr. This operation provides release barrier semantics, ensuring that all writes to data associated with these bits are visible before the bits are freed. Only supports nr_bits <= bits_per_word() to ensure all bits are within the same word.
The hint parameter is updated for better cache locality in subsequent allocations.
test_bit(&self, bitnr: usize) -> boolCheck if a bit is currently allocated.
weight(&self) -> usizeCount the number of currently allocated bits.
depth(&self) -> usizeGet the total number of bits in the bitmap.
The shift parameter is crucial for tuning sbitmap performance based on your workload:
When to use a smaller shift:
Examples:
// High contention scenario (32-core NUMA system)
let sb = Sbitmap::new(1024, Some(4), false); // 2^4 = 16 bits per word, 64 words
// Low contention scenario (4-core system)
let sb = Sbitmap::new(1024, Some(6), false); // 2^6 = 64 bits per word, 16 words
// Let sbitmap decide (recommended starting point)
let sb = Sbitmap::new(1024, None, false); // Auto-calculated based on depth
Trade-offs:
None) provides a balanced default suitable for most workloadsget(): Acquire semantics - ensures allocated bit is visible before useput(): Release semantics - ensures all writes complete before bit is freedget_batch(): Acquire semantics - ensures all allocated bits are visible before useput_batch(): Release semantics - ensures all writes complete before bits are freed| Feature | sbitmap | Mutex + BitVec | AtomicBitSet |
|---|---|---|---|
| Lock-free | ✅ | ❌ | ✅ |
| Cache-optimized | ✅ | ❌ | ❌ |
| Per-thread hints | ✅ | ❌ | ❌ |
| Kernel-proven design | ✅ | ❌ | ❌ |
To compare sbitmap performance against a simple lockless bitmap:
# Run with defaults (32 bits, auto shift, 10 seconds, N-1 tasks)
cargo run --bin bench_compare --release
# Specify bitmap depth and duration
cargo run --bin bench_compare --release -- --depth 1024 --time 5
# Specify bitmap depth, shift, and duration
cargo run --bin bench_compare --release -- --depth 512 --shift 5 --time 10
# Benchmark batch operations (allocating 4 consecutive bits)
cargo run --bin bench_compare --release -- --depth 128 --batch 4 --time 5
# Show help
cargo run --bin bench_compare --release -- --help
This benchmark:
Options:
--depth DEPTH - Bitmap depth in bits (default: 32)--shift SHIFT - log2(bits per word), auto-calculated if not specified--time TIME - Benchmark duration in seconds (default: 10)--tasks TASKS - Number of concurrent tasks (default: NUM_CPUS - 1)--batch NR_BITS - Use get_batch/put_batch with NR_BITS (default: 1, single bit mode)--round-robin - Enable round-robin allocation mode (default: disabled)See benches/README.md for more details.
Example output on a 32-CPU system:
System: 32 CPUs detected, 2 NUMA nodes, using 31 tasks for benchmark
Bitmap depth: 32 bits
Shift: auto-calculated (bits per word: 8)
Duration: 10 seconds
=== Sbitmap (Optimized) Benchmark ===
Configuration:
- Duration: 10s
- Tasks: 31
- Bitmap depth: 32 bits
Results:
Task 0: 3101117 ops, 310111 ops/sec (0.3101 Mops/sec)
...
Task 30: 3169582 ops, 316958 ops/sec (0.3170 Mops/sec)
Total: 93604448 ops, 9360444 ops/sec (9.3604 Mops/sec)
=== SimpleBitmap (Baseline) Benchmark ===
Configuration:
- Duration: 10s
- Tasks: 31
- Bitmap depth: 32 bits
Results:
Task 0: 1998241 ops, 199824 ops/sec (0.1998 Mops/sec)
...
Task 30: 1835360 ops, 183536 ops/sec (0.1835 Mops/sec)
Total: 62530560 ops, 6253056 ops/sec (6.2531 Mops/sec)
Licensed under either of:
at your option.
Contributions are welcome! Please feel free to submit a Pull Request.
Based on the Linux kernel's sbitmap implementation by Jens Axboe and other contributors.