# Word Level Parallelism [![Crates.io][bp-crates-badge]][bp-crates-url] [![Documentation][bp-docs-badge]][bp-docs-url] Bit level algorithms useful when developing specialized integer data structures such as the [x-fast trie](http://web.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/15/Small15.pdf) ## Overview Arithmetic and logical operations take, for all intents and purposes, constant time. Such operations operate on whole words. (A word is the size of a single memory segment. This crate, assumes a word size width of `64`. For a more in-depth discussion of computer memory, refer to [this note](https://akkadia.org/drepper/cpumemory.pdf)). For instance, it takes constant time to add two `64` bit numbers. The central idea behind the algorithms in this library is this: * If you have a bunch of small integers — each smaller that sixty four bits, e.g. a bunch of bytes, we can pack many of them into a single sixty four bit integer. * We can then operate on that packed integer as if it were a single number. For example, we can fit 8 byte sized numbers in a single word. * By operating on the packed integer, we are in effect operating on 8 different integers in parallel. This is what is called `world level parallelism`. ## Algorithms The Algorithms implemented include: ### Finding the `top(k)` bits of an integer The first procedure is quite simple. The goal is, given a number `x` and a length `k`, to extract the first `k` bits of `x` in `O(1)`. A procedure that does this will be handy when implementing the x-fast trie. ### The `SardineCan` Structure Suppose we wish to maintain a set of small sized integers in a B-Tree. And suppose too that we wish to take advantage of the fact that we can fit many of these integers in a single, larger integer. How would we go about designing a single node in such a B-Tree? Recall that a B-Tree of order `b` is a multi-way search tree in which each node is a bucket that must contain between `b - 1` and `2b - 1` keys. Furthermore, each node has one more child than the number of keys it contains. That is, each node must have between `b` and `2b` child nodes. Operations on B-Trees rely on one key operation: `node.rank(x)`. This operation searches through the keys of a single node (which are sorted) and either returns the location of `x` in the node, or the index of the child we need to descend into in order to complete the operation at hand. In run of the mill B-Trees, `node.rank(x)` is implemented using binary search and thus takes `O(lg b)`. However, if our keys are small integers, we can perform `node.rank(x)` in `O(1)`. The `SardineCan` implements a B-Tree Node specialized for storing small integers. ### `O(1)` Most Significant Bit: FourRussiansMSB When we talk of the most significant bit of a number, we're often referring to the 0-indexed location of the highest bit set. Note that this is a more general problem than simply finding the number that would be formed if only the `msb` were set. For instance, `MSB(010010001)` is `7` and not `128`. The simplest method for finding this index in by doing a linear scan over the bits of the number in question while keeping a count of the number of bits seen thus far. This scheme runs in `O(lg n)` where `n` is the highest number our function may operate on. ```rust /// A procedure for finding the index of the most significant /// bit in time linear to the number of bits used /// to represent the value. fn get_msb_idx_of(query: u64) -> u8 { for i in (0..64).rev() { if query & (1 << i) != 0 { return i; } } panic!("MSB(0) is undefined") } ``` We can improve upon the linear scanning procedure using bit level binary search. This brings down the running time to `O(lg lg n)`. Often, however, when we know that we'll be doing many `msb` queries, we use a lookup table to compute this value. Using that method, we're able to locate the index of the highest bit set in constant `O(1)` time, albeit with an added preprocessing step to build the lookup table. We can, using bit level parallelism, locate the index of the most significant bit in constant time without using a lookup table. ## References 1. [CS 166 Lecture 15](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/15/Slides15.pdf) 2. [CS 166 Lecture 16](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/16/Slides16.pdf) 3. [CS 166 Lecture 17](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/17/Slides17.pdf) 4. [6.851](http://courses.csail.mit.edu/6.851/fall17/scribe/lec12.pdf) 5. [The Original Fusion Tree Paper](https://reader.elsevier.com/reader/sd/pii/0022000093900404?token=1610EF62181DAC974715067B85459A4709A9BC64E39827CE0369C6C8E18540DFD1DBAD38BEE35BFF95C4C05E45A1D1D5) 6. [This StackOverflow Question. Scroll down until you find the answer by user `templatetypedef`](https://stackoverflow.com/questions/3878320/understanding-fusion-trees) [bp-crates-badge]: https://img.shields.io/crates/v/bit-parallelism.svg?style=for-the-badge&logo=rust [bp-crates-url]: https://crates.io/crates/bit-parallelism [bp-docs-badge]: https://img.shields.io/docsrs/bit-parallelism/latest?style=for-the-badge&logo=docs.rs [bp-docs-url]: https://docs.rs/bit-parallelism