lsmtk ===== This library provides an implementation of an log-structured merge tree; this merge tree uses a new compaction algorithm called *triangular compaction* that achieves a factor of 6.5x write amplification both in theory and in practice. Triangular Compaction --------------------- Triangular compaction takes note of a single observation: moving from one level to the next is inefficent; instead, the compaction should spread across levels to move data across as many levels as is possible. The core of the compaction algorithm comes from the intuition embedded in this Python snippet: ```ignore sums = 0 count = 0 ingest = 0 for i in range(2**ceil(LSM_NUM_LEVELS)): bits = 0 while i and (1 << bits) & i: bits += 1 sums += (1 << bits) * TARGET_FILE_SIZE * LSM_LEVEL_0_NUM_FILES ingest += TARGET_FILE_SIZE * LSM_LEVEL_0_NUM_FILES count += 1 COMPACTION_AVERAGE_SIZE = sums / count + LSM_LEVEL_0_NUM_FILES * TARGET_FILE_SIZE # bytes WRITE_AMPLIFICATION = sums / ingest ``` In this compaction we select the lowest N levels that are full, stopping at the first level that's empty. In this way we amortize the cost of a compaction similar to how a `Vec` works in Rust or `vector` in C++. The triangular compaction algorithm generalizes this intuition to select triangles from the LSM tree such that the transitive closure of files under that level will be included in the compaction. Check `src/tree/mod.rs` for the compaction algorithm. Trade-Offs ---------- Write amplification is bounded above by 6.5x. - Read amplification is unbounded, but stays low empirically. - Space amplification is bounded by 2x. In an empirical benchmark where we ingested 44GiB of data we saw the write amplification stayed under a factor of 3x (trash is the ssts that were compacted away; it will be emptied periodically in a real system): ```ignore 4.0K db/compaction 71M db/ingest 8.8M db/mani 44G db/sst 127G db/trash ``` In process counters confirms a 2.9x write amplification. ```ignore lsmtk.bytes_ingested = 46552563486 lsmtk.compaction.bytes_written = 135401535775 ``` Status ------ Active development. Scope ----- This library provides pieces of an lsm graph. It will eventually grow to support everything necessary to create an embedded lsm graph analogous to LevelDB and RocksDB. Warts ----- - This library is under-tested and will see active development in the future. - Tricks used in LevelDB (grandfather overlap) and PebblesDB (guard pages) are not used. These are 100% compatible and would only improve the compaction algorithm's performance. - There is no back pressure against excessive ingest. - There's a concurrency bug that shows up around the point of 40GiB where compaction will stall. This is just at the prototype phase and I've run out of funds to continue developing it. Documentation ------------- The latest documentation is always available at [docs.rs](https://docs.rs/lsmtk/latest/lsmtk/).