# Versioned Log Stream Merkle Tree :warning: This repository is a **work in progress** and is **not production ready** :construction: ![Architecture](./assets/architecture.png) - **Versioned Log Stream Merkle Tree** is an authenticated key-value store optimized for large-scale use. - Origin: [LETUS](https://dl.acm.org/doi/pdf/10.1145/3626246.3653390) (See DMM-Trie plus VDLS). ## Objectives 1. **Authentication**: provide commitment (state root) and inclusion proofs for key-value pairs. 2. **Multi-version**: track and provide authenticated historical states. 3. **Performance**: minimize I/O amplifications of the underlying storage. ## Architecture ### Tree Structure ![Tree](./assets/tree.png) - The current design utilizes a 16-way trie as Merkle Tree. - Each leaf node records: - $V$ the data version. - $k$ the key. - $l$ the location of the node in the underlying storage. - $h = H(k, v)$ the Merkle hash of this node. - Each index node maintains 16 slots and a bitmap $b$ to indicate whether a slot is not empty. - Occupied slots record data version $V_c$, Merkle hash $h_c$, and a memory reference $r$ of its child node. - The Merkle hash of an index node is computed after concatenating the Merkle hash of its child nodes. - The root hash of the VLSM tree is recursively calculated with the child nodes. - Nodes are routed via 4-bit nibbles, with multiple nibbles spliced to form a nibble path. - Nibbles are derived from $H(k)$'s prefix to ensure balance and resistance to attacks. - The VLSM tree structure is deterministic given a set of key-value pairs, as the splitting and merging of VLSM tree nodes depend solely on the nibbles. - The root hash is, therefore, deterministic by the dataset, regardless of the order of the updates. ### Multi-Versioning - VLSM tree supports data multi-versioning by maintaining the data version $V$ in each node and allows queries of value data, root hash, and Merkle proof for each version. - $V$ is a comparable 64-bit integer, such as a block number or transaction number. - When data is updated, the $V$ and Merkle hash $h$ of the nodes along the vertical path from root to leaf need to be updated. - Redirect-on-write (RoW) is employed to optimize memory usage, which prevents multi-versioned data structures from copying unnecessary data. - Although historical versions are necessary for auditability and verification of data changes, they are generally infrequently accessed. - The storage engine should skew resources, such as memory, towards the latest versions to accelerate transaction processing.