| Crates.io | jdb_fsst |
| lib.rs | jdb_fsst |
| version | 0.1.4 |
| created_at | 2026-01-15 07:42:58.666786+00 |
| updated_at | 2026-01-16 19:11:07.175442+00 |
| description | Fast Static Symbol Table (FSST) compression for string data / 针对字符串数据优化的快速静态符号表压缩 |
| homepage | https://github.com/js0-site/jdb/tree/main/jdb_fsst |
| repository | https://github.com/js0-site/jdb.git |
| max_upload_size | |
| id | 2044827 |
| size | 172,813 |
jdb_fsst implements Fast Static Symbol Table (FSST) algorithm for compressing string data in high-performance databases and search engines. Unlike block-based compression, it allows fast random access to compressed strings without decompressing surrounding data.
Compared to fsst: 1.3x faster encoding, 1.9x faster batch decoding, 18x faster random decoding.
use jdb_fsst::train;
use jdb_fsst::decode::Decode;
fn main() {
let lines = vec![
"hello world".as_bytes(),
"fsst is fast".as_bytes(),
"compression for databases".as_bytes(),
];
// 1. Train encoder using sample data
let encoder = train(&lines).expect("training failed");
// 2. Derive decoder from encoder
let decoder = Decode::from(&encoder);
// 3. Encode data
let mut encoded = Vec::new();
encoder.encode(lines[0], &mut encoded);
// 4. Decode data
let mut decoded = Vec::new();
decoder.decode(&encoded, &mut decoded);
assert_eq!(decoded, lines[0]);
}
The following diagram illustrates the workflow of jdb_fsst:
graph TD
A[Sample Data] --> B[train]
B --> C[Encoder]
C --> D[Symbol Table]
D --> E[Decoder]
F[Input String] --> C
C --> G[Compressed Data]
G --> E
E --> H[Decompressed String]
src/: Core implementation logic.
lib.rs: Library entry and public API.encode.rs: Compression logic.decode.rs: Decompression logic.symbol.rs: FSST symbol representation.table/: Symbol table management and construction.tests/: Integration tests and usage examples.benches/: Performance benchmarks.train<T: AsRef<[u8]>>(items: &[T]) -> io::Result<encode::Encode>
Trains an encoder based on provided sample data.encode::Encode
Finalized encoder.
encode(&self, data: &[u8], out: &mut Vec<u8>) -> usize: Compresses data.decode::Decode
Finalized decoder.
decode(&self, data: &[u8], out: &mut Vec<u8>) -> usize: Batch decompress or decompress a single item.decode_boxed(&self, data: &[u8]) -> Box<[u8]>: Handy decoder returning boxed slice.Test environment: Apple M2 Max, macOS, Rust 1.87
Test data: 1MB English and Chinese text files
| Metric | fsst | jdb_fsst | Speedup |
|---|---|---|---|
| Encoding (MB/s) | 267 | 352 | 1.32x |
| Batch Decoding (MB/s) | 3562 | 6668 | 1.87x |
| Random Decoding (MB/s) | 156 | 2936 | 18.8x |
| Compression Ratio | 52.67% | 50.45% | +2.2% |
FSST was introduced by Peter Boncz, Thomas Neumann, and Viktor Leis at VLDB 2020. It emerged from research in column-store databases where string processing often dominates execution time. Traditional LZ77-based methods like Zstd or Snappy offer good ratios but fail to provide the granularity needed for row-level access in analytical queries. FSST bridges this gap by maintaining symbols in a compact 2KB table, allowing modern CPUs to process codes at near-memory bandwidth speeds.
This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.
We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:
jdb_fsst 实现了快速静态符号表 (FSST) 算法,专为高性能数据库和搜索引擎中的字符串压缩设计。与块压缩不同,它支持在不解压周边数据的情况下快速随机访问单个字符串。
相比 fsst:编码快 1.3 倍,批量解码快 1.9 倍,随机解码快 18 倍。
use jdb_fsst::train;
use jdb_fsst::decode::Decode;
fn main() {
let lines = vec![
"hello world".as_bytes(),
"fsst is fast".as_bytes(),
"compression for databases".as_bytes(),
];
// 1. 使用样本数据训练编码器
let encoder = train(&lines).expect("训练失败");
// 2. 从编码器导出解码器
let decoder = Decode::from(&encoder);
// 3. 编码数据
let mut encoded = Vec::new();
encoder.encode(lines[0], &mut encoded);
// 4. 解码数据
let mut decoded = Vec::new();
decoder.decode(&encoded, &mut decoded);
assert_eq!(decoded, lines[0]);
}
以下图表展示了 jdb_fsst 的工作流程:
graph TD
A[样本数据] --> B[train]
B --> C[编码器]
C --> D[符号表]
D --> E[解码器]
F[输入字符串] --> C
C --> G[压缩数据]
G --> E
E --> H[解压字符串]
src/: 核心实现逻辑。
lib.rs: 库入口及公共 API。encode.rs: 压缩逻辑。decode.rs: 解压逻辑。symbol.rs: FSST 符号表示。table/: 符号表管理与构建。tests/: 集成测试与使用示例。benches/: 性能基准测试。train<T: AsRef<[u8]>>(items: &[T]) -> io::Result<encode::Encode>
根据提供的样本数据训练并生成编码器。encode::Encode
固化的编码器对象。
encode(&self, data: &[u8], out: &mut Vec<u8>) -> usize: 执行数据压缩。decode::Decode
固化的解码器对象。
decode(&self, data: &[u8], out: &mut Vec<u8>) -> usize: 执行批量或单项目随机解压。decode_boxed(&self, data: &[u8]) -> Box<[u8]>: 返回 Box 切片的便捷解压方法。测试环境:Apple M2 Max,macOS,Rust 1.87
测试数据:1MB 中英文文本文件
| 指标 | fsst | jdb_fsst | 提升 |
|---|---|---|---|
| 编码 (MB/s) | 267 | 352 | 1.32x |
| 批量解码 (MB/s) | 3562 | 6668 | 1.87x |
| 随机解码 (MB/s) | 156 | 2936 | 18.8x |
| 压缩率 | 52.67% | 50.45% | +2.2% |
FSST 算法由 Peter Boncz、Thomas Neumann 和 Viktor Leis 在 VLDB 2020 大会上提出。该算法源于列存数据库的研究,旨在解决字符串处理占据大量执行时间的问题。传统的基于 LZ77 的方法(如 Zstd 或 Snappy)虽然压缩率高,但在分析型查询中无法提供行级访问所需的灵活性。FSST 通过将常用符号维护在一个仅 2KB 的小型表中,使现代 CPU 能以接近内存带宽的速度处理编码。
本项目为 js0.site ⋅ 重构互联网计划 的开源组件。
我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注: