| Crates.io | jdb_idx |
| lib.rs | jdb_idx |
| version | 0.1.1 |
| created_at | 2025-12-23 21:04:38.867078+00 |
| updated_at | 2025-12-23 21:04:38.867078+00 |
| description | High-performance SSTable index engine / 高性能 SSTable 索引引擎 |
| homepage | https://github.com/js0/jdb/tree/main/jdb_idx |
| repository | https://github.com/js0/jdb.git |
| max_upload_size | |
| id | 2002340 |
| size | 97,732 |
jdb_idx is the SSTable index module for JDB (a key-value database). It provides efficient sorted key-value storage with sparse indexing, block-based organization, and LRU caching for high-performance read/write operations.
SSTable (Sorted String Table) is the core storage format used by LSM-Tree based databases. jdb_idx implements:
bytes::Bytescargo add jdb_idx
use jdb_idx::SstWriter;
use jdb_proto::ValPtr;
async fn write_sst() -> jdb_idx::Result<()> {
let mut writer = SstWriter::create("data.sst").await?;
// Keys must be added in sorted order
for i in 0u32..1000 {
let key = format!("key{i:05}");
let ptr = ValPtr::new(1, i as u64 * 100, 50);
writer.add(key.as_bytes(), &ptr).await?;
}
let footer = writer.finish().await?;
println!("Wrote {} entries in {} blocks", footer.num_entries, footer.num_blocks);
Ok(())
}
use jdb_idx::{SstReader, BlockCache};
async fn read_sst() -> jdb_idx::Result<()> {
let reader = SstReader::open(1, "data.sst").await?;
let cache = BlockCache::new(1024);
// Point query
if let Some(ptr) = reader.get(b"key00500", &cache).await? {
println!("Found: offset={}, len={}", ptr.offset, ptr.len);
}
// Full scan
let mut iter = reader.iter();
while let Some((key, ptr)) = iter.next().await? {
println!("{:?} -> {:?}", key, ptr);
}
Ok(())
}
┌─────────────────────────────────────────┐
│ Data Blocks (4KB each) │
│ ┌─────────────────────────────────┐ │
│ │ Block 0: [Count][Records...] │ │
│ ├─────────────────────────────────┤ │
│ │ Block 1: [Count][Records...] │ │
│ ├─────────────────────────────────┤ │
│ │ ... │ │
│ └─────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ Index Block │
│ [Count:u32][KeyLen:u16][Key]... │
├─────────────────────────────────────────┤
│ Footer (64 bytes) │
│ [IndexOffset][IndexLen][NumBlocks] │
│ [NumEntries][Padding][Magic:JDBX] │
└─────────────────────────────────────────┘
graph TD
A[SstWriter] -->|add key-ptr| B[BlockBuilder]
B -->|block full| C[Flush to File]
B -->|record first key| D[IndexBuilder]
C --> E[Next Block]
D -->|finish| F[Write Index]
F --> G[Write Footer]
H[SstReader] -->|open| I[Read Footer]
I --> J[Load Index]
J --> K[IndexSearcher]
L[Query Key] --> K
K -->|binary search| M[Block Index]
M --> N[BlockCache]
N -->|miss| O[Load from Disk]
N -->|hit| P[BlockIter]
O --> P
P -->|linear search| Q[Result]
| Type | Description |
|---|---|
SstWriter |
Async SST file writer, accepts sorted key-ptr pairs |
SstReader |
Async SST file reader with point query and iteration |
SstIter |
Lazy async iterator for full table scan |
BlockBuilder |
Builds 4KB data blocks with key-ptr records |
BlockIter |
Iterator for reading entries from block |
IndexBuilder |
Builds sparse index from first keys |
IndexSearcher |
Binary search over sparse index |
BlockCache |
Thread-safe LRU cache for data blocks |
Footer |
SST file metadata (64 bytes) |
| Constant | Value | Description |
|---|---|---|
BLOCK_SIZE |
4096 | Data block size in bytes |
FOOTER_SIZE |
64 | Footer size in bytes |
| Error | Description |
|---|---|
InvalidFooter |
Footer format invalid |
MagicMismatch |
Magic number mismatch |
InvalidIndex |
Index format invalid |
EmptySst |
SST file has no entries |
FileTooSmall |
File smaller than minimum size |
KeyTooLarge |
Key exceeds block capacity |
IndexTooLarge |
Index entry count overflow |
jdb_idx/
├── src/
│ ├── lib.rs # Module exports
│ ├── block.rs # BlockBuilder, BlockIter
│ ├── cache.rs # BlockCache (LRU)
│ ├── error.rs # Error types
│ ├── index.rs # IndexBuilder, IndexSearcher
│ ├── layout.rs # Footer, constants
│ ├── reader.rs # SstReader, SstIter
│ └── writer.rs # SstWriter
├── tests/
│ └── main.rs # Integration tests
└── Cargo.toml
| Component | Library | Purpose |
|---|---|---|
| Async I/O | compio | io_uring based async runtime |
| Buffer | bytes | Zero-copy byte buffer |
| Cache | lru + parking_lot | Thread-safe LRU cache |
| Error | thiserror | Error type derivation |
| Alloc | jdb_alloc | Page-aligned buffer allocation |
| File | jdb_fs | Direct I/O file operations |
The SSTable format originated from Google's Bigtable paper (2006), which introduced the LSM-Tree architecture for handling write-heavy workloads. The key insight was separating writes (sequential, fast) from reads (random, slower) by buffering writes in memory and periodically flushing sorted runs to disk.
LevelDB (2011) brought SSTable to the open-source world, followed by RocksDB (2012) which added numerous optimizations. The 4KB block size aligns with filesystem page size and SSD erase blocks, enabling Direct I/O to bypass kernel page cache for predictable latency.
The sparse index technique trades memory for disk seeks: instead of indexing every key, only the first key of each block is stored. This reduces index size by ~100x while adding at most one extra disk read per query.
This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.
We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:
jdb_idx 是 JDB 键值数据库的 SSTable 索引模块。提供高效的有序键值存储,支持稀疏索引、块组织和 LRU 缓存,实现高性能读写操作。
SSTable(Sorted String Table)是 LSM-Tree 数据库的核心存储格式。jdb_idx 实现了:
bytes::Bytes 的零拷贝反序列化cargo add jdb_idx
use jdb_idx::SstWriter;
use jdb_proto::ValPtr;
async fn write_sst() -> jdb_idx::Result<()> {
let mut writer = SstWriter::create("data.sst").await?;
// Key 必须按序添加
for i in 0u32..1000 {
let key = format!("key{i:05}");
let ptr = ValPtr::new(1, i as u64 * 100, 50);
writer.add(key.as_bytes(), &ptr).await?;
}
let footer = writer.finish().await?;
println!("写入 {} 条记录,共 {} 块", footer.num_entries, footer.num_blocks);
Ok(())
}
use jdb_idx::{SstReader, BlockCache};
async fn read_sst() -> jdb_idx::Result<()> {
let reader = SstReader::open(1, "data.sst").await?;
let cache = BlockCache::new(1024);
// 点查询
if let Some(ptr) = reader.get(b"key00500", &cache).await? {
println!("找到: offset={}, len={}", ptr.offset, ptr.len);
}
// 全表扫描
let mut iter = reader.iter();
while let Some((key, ptr)) = iter.next().await? {
println!("{:?} -> {:?}", key, ptr);
}
Ok(())
}
┌─────────────────────────────────────────┐
│ 数据块 (每块 4KB) │
│ ┌─────────────────────────────────┐ │
│ │ Block 0: [Count][Records...] │ │
│ ├─────────────────────────────────┤ │
│ │ Block 1: [Count][Records...] │ │
│ ├─────────────────────────────────┤ │
│ │ ... │ │
│ └─────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ 索引块 │
│ [Count:u32][KeyLen:u16][Key]... │
├─────────────────────────────────────────┤
│ Footer (64 字节) │
│ [IndexOffset][IndexLen][NumBlocks] │
│ [NumEntries][Padding][Magic:JDBX] │
└─────────────────────────────────────────┘
graph TD
A[SstWriter] -->|add key-ptr| B[BlockBuilder]
B -->|块满| C[刷盘]
B -->|记录首 key| D[IndexBuilder]
C --> E[下一块]
D -->|finish| F[写入索引]
F --> G[写入 Footer]
H[SstReader] -->|open| I[读取 Footer]
I --> J[加载索引]
J --> K[IndexSearcher]
L[查询 Key] --> K
K -->|二分查找| M[块索引]
M --> N[BlockCache]
N -->|未命中| O[从磁盘加载]
N -->|命中| P[BlockIter]
O --> P
P -->|线性查找| Q[结果]
| 类型 | 说明 |
|---|---|
SstWriter |
异步 SST 写入器,接收有序 key-ptr 对 |
SstReader |
异步 SST 读取器,支持点查和迭代 |
SstIter |
惰性异步迭代器,用于全表扫描 |
BlockBuilder |
构建 4KB 数据块 |
BlockIter |
块内条目迭代器 |
IndexBuilder |
从首 key 构建稀疏索引 |
IndexSearcher |
稀疏索引二分查找 |
BlockCache |
线程安全 LRU 块缓存 |
Footer |
SST 文件元数据 (64 字节) |
| 常量 | 值 | 说明 |
|---|---|---|
BLOCK_SIZE |
4096 | 数据块大小 (字节) |
FOOTER_SIZE |
64 | Footer 大小 (字节) |
| 错误 | 说明 |
|---|---|
InvalidFooter |
Footer 格式无效 |
MagicMismatch |
魔数不匹配 |
InvalidIndex |
索引格式无效 |
EmptySst |
SST 文件为空 |
FileTooSmall |
文件小于最小尺寸 |
KeyTooLarge |
Key 超出块容量 |
IndexTooLarge |
索引条目数溢出 |
jdb_idx/
├── src/
│ ├── lib.rs # 模块导出
│ ├── block.rs # BlockBuilder, BlockIter
│ ├── cache.rs # BlockCache (LRU)
│ ├── error.rs # 错误类型
│ ├── index.rs # IndexBuilder, IndexSearcher
│ ├── layout.rs # Footer, 常量
│ ├── reader.rs # SstReader, SstIter
│ └── writer.rs # SstWriter
├── tests/
│ └── main.rs # 集成测试
└── Cargo.toml
| 组件 | 库 | 用途 |
|---|---|---|
| 异步 I/O | compio | 基于 io_uring 的异步运行时 |
| 缓冲区 | bytes | 零拷贝字节缓冲 |
| 缓存 | lru + parking_lot | 线程安全 LRU 缓存 |
| 错误 | thiserror | 错误类型派生 |
| 内存分配 | jdb_alloc | 页对齐缓冲区分配 |
| 文件 | jdb_fs | Direct I/O 文件操作 |
SSTable 格式源自 Google 2006 年发表的 Bigtable 论文,该论文提出了 LSM-Tree 架构来处理写密集型负载。核心思想是将写入(顺序、快速)与读取(随机、较慢)分离:写入先缓存在内存中,定期刷新为磁盘上的有序文件。
LevelDB(2011)将 SSTable 带入开源世界,随后 RocksDB(2012)添加了大量优化。4KB 块大小与文件系统页大小和 SSD 擦除块对齐,使 Direct I/O 能够绕过内核页缓存,获得可预测的延迟。
稀疏索引技术以内存换磁盘寻道:不索引每个 key,只存储每块的首 key。这将索引大小减少约 100 倍,同时每次查询最多增加一次磁盘读取。
有趣的是,"SSTable" 这个名字中的 "String" 其实是历史遗留。在 Bigtable 中,所有数据都被视为字节串(byte string),因此得名。现代实现中,SSTable 存储的是任意二进制键值对,早已超越了字符串的范畴。
本项目为 js0.site ⋅ 重构互联网计划 的开源组件。
我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注: