small-map

Crates.io	small-map
lib.rs	small-map
version	0.1.5
created_at	2023-11-02 06:51:32.722293+00
updated_at	2026-01-07 23:04:47.693469+00
description	An inline SIMD accelerated hashmap designed for small amount of data.
homepage
repository	https://github.com/ihciah/small-map
max_upload_size
id	1022342
size	180,852

ihc童鞋@提不起劲 (ihciah)

documentation

README

Small-Map

An inline SIMD accelerated hashmap designed for small amount of data.

Usage

use small_map::FxSmallMap;
// Don't worry about the 32 here, it's the inline size, not the limitation.
// If the size grows more than 32, it will automatically switch to heap impl.
type MySmallMap<K, V> = FxSmallMap<32, K, V>;

let mut map = MySmallMap::new();
map.insert(1_u8, 2_u8);
assert_eq!(*map.get(&1).unwrap(), 2);

Usually you can use this map for short lifetime and small key/values, for example, http or RPC headers.

You can use it like other normal hashmaps without worrying about its size.

Choosing N

The choice of N involves a trade-off between inline storage benefits and struct size overhead:

Struct size = N * (1 + sizeof(K) + sizeof(V)) + 2 * sizeof(usize) + sizeof(Hasher)

Considerations

Key/Value size: Larger K/V types increase struct size proportionally with N
Pass frequency: If the map is frequently passed by value (moved/copied), a large struct hurts performance
Expected element count: N should cover most use cases to avoid heap fallback

Tip: Choose N as a multiple of SIMD_WIDTH (16 on x86_64, 8 on aarch64) for optimal SIMD utilization. When uncertain, start with N = 16 or N = 32.

Performance

The performance of SmallMap depends on the capacity, Key/Value size and operation scenario.

It is recommended to set the size to 16(or 32) because with SSE2 it can search 16 items within a single instruction. It is only recommended to use SmallMap for small Key/Values, such as numbers and strings.

Map Performance (4-32)

Map Performance (4-64)

All Maps Performance (4-128)

[!NOTE] The lower the time(ns), the better the performance. You can find the benchmark source code in the benches folder.

Benchmark with u8 key/value on AMD Ryzen 9 7950X.

How it Works

Like SmallVec, for HashMap with a small amount of data, inlining the data can avoid heap allocation overhead and improve performance.

SmallMap stores key-value pairs in a contiguous inline array, with an additional control byte array storing the hash fingerprint for SIMD-accelerated lookup.

Lookup Strategy

The lookup strategy is determined at compile time based on N and at runtime based on current length:

Linear Search (no hash computation): When N < SIMD_WIDTH or len < SIMD_WIDTH
- Simply iterates through the array comparing keys directly
- Avoids hash computation overhead for small maps
- SIMD_WIDTH is 16 on x86_64 (SSE2) and 8 on aarch64 (NEON)
SIMD Search (with h2 hash): When N >= SIMD_WIDTH and len >= SIMD_WIDTH
- Computes h2 (8-bit hash fingerprint) from the key's hash
- Uses SIMD to compare h2 against 16 or 8 control bytes in parallel
- Only performs key comparison on h2 matches
Heap Fallback: When len > N
- Data is moved to a hashbrown-based heap HashMap
- Performance is equivalent to hashbrown

Memory Layout

┌─────────────────────────────────────┐
│  AlignedGroups (N bytes)            │  ← h2 control bytes for SIMD
├─────────────────────────────────────┤
│  len: usize                         │
├─────────────────────────────────────┤
│  data: [(K, V); N]                  │  ← contiguous key-value pairs
└─────────────────────────────────────┘

Complexity

Let W = SIMD_WIDTH (16 on x86_64, 8 on aarch64)

Operation	len ≤ W	W < len ≤ N	len > N
`get`	O(len)	O(len/W)	O(1)*
`insert`	O(len)	O(len/W)	O(1)*
`insert_unique_unchecked`	O(1)	O(1)	O(1)*
`remove`	O(len)	O(len/W)	O(1)*
`iter`	O(len)	O(len)	O(capacity)

* amortized

Credit

Hashbrown is heavily referenced to and copied by this project, and is a very elegant and efficient implementation.

Commit count: 20