k2_tree

Crates.iok2_tree
lib.rsk2_tree
version0.5.3
sourcesrc
created_at2020-06-03 14:03:20.387008
updated_at2022-04-12 15:59:26.373647
descriptionA space-efficient representation of sparsely populated bit-matrices.
homepage
repositoryhttps://github.com/GGabi/k2_tree
max_upload_size
id249666
size140,746
Gabriel Roels (GGabi)

documentation

https://docs.rs/k2_tree/

README

Build Status API

k2_tree

A collection designed to efficiently compress sparsely-populated bit-matrices.

See the original proposal here.

Note: This library heavily relies upon bitvec to optimally store its data, which is very slow when compiled without optimisations!

Usage

Add k2_tree into your project dependencies:

[dependencies]
k2_tree = "0.5.2"

When K2Trees are Useful:

K2Trees are extremely efficient at representing data that can be encoded as a two-dimensional bit-matrix, especially if said matrix is sparsely populated.

Take a real-world example: representing Web-Graphs.

Hyperlinks between webpages can be encoded as a 2-d bit-matrix, where each column/row corresponds to a specific page and each bit denotes whether two pages are joined via a hyperlink; 1 if yes, 0 if not.

These adjacency-matrices tend to be extremely sparse most of the time, making the K2Tree the perfect structure for encoding them!

Another example is representing Triple-Stores, which this repo demonstrates is effective.

Example:

Original Bit-Matrix:

00|00||10|10
00|00||00|11
------------
00|00||00|00
00|00||00|10
============
10|10||00|11
10|00||00|00
------------
00|00||00|00
00|00||00|00

Bit-Representation:

[0111; 1101, 1100, 0100; 1000, 1011, 0010, 1010, 1000, 1100]

(Where ; separates layers and , separates blocks)

Final K2Tree:

K2Tree {
  stem_k: 2, // usize
  leaf_k: 2, // usize
  max_slayers: 2, // usize
  stems: [0111110111000100], // BitVec
  leaves: [100010110010101010001100], // BitVec
}

For a more in-depth explanation of the compression process, check this out.

The Road to 1.0:

  • Make K2Tree work over any value of K.
  • Separate the k field into two distinct fields: stem_k, leaf_k.
  • Increase compression ratio by removing the stem_to_leaf and slayer_starts field without compromising operation complexity.
  • Implement serde's Serialize and Deserialize traits.
  • Unit test all the things.
  • Stabilise the API.

- GGabi

Commit count: 73

cargo fmt