# RleVec A rust crate providing a vector like struct that stores data as runs of identical values. If your data consists of long stretches of identical values it can be beneficial to only store the number of times each value occurs. This can result in significant space savings, but there is a cost. Accessing an arbitrary index requires a binary search over the stored runs resulting in a `(log n)` complexity versus `O(1)` for a normal vector. Other complexities are in the table where `n` is equal to the number of runs, not the length of a comparable `Vec`. | |push|index |set with breaking a run|set without breaking a run|insert with breaking a run|insert without breaking a run| |--------|----|--------|-----------------------|--------------------------|--------------------------|-----------------------------| |`RleVec`|O(1)|O(log n)|O((log n) + 2n)|O(log n)|O((log n) + 2n)|O((log n) + n)| |`Vec`|O(1)|O(1)|O(1)*| |O(n)| | The `RleVec` struct handles like a normal vector and supports a subset from the `Vec` methods. ## Usage Put this in your `Cargo.toml`: ```toml [dependencies] rle_vec = "0.4" ``` and this to your crate root: ```rust extern crate rle_vec; ``` ## Examples: ```rust use rle_vec::RleVec; let mut rle = RleVec::new(); rle.push(10); rle.push(10); rle.push(11); assert_eq!(rle[1], 10); assert_eq!(rle[2], 11); rle.insert(1, 10); assert_eq!(rle.runs_len(), 2); rle.set(0, 10); assert_eq!(rle.runs_len(), 3); ``` `RleVec` can be constructed from `Iterators` and be iterated over just like a `Vec`. ```rust use rle_vec::RleVec; let v = vec![0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 4, 5, 4, 4, 4]; let mut rle: RleVec<_> = v.into_iter().collect(); assert_eq!(rle.len(), 15); assert_eq!(rle.runs_len(), 7); assert_eq!(rle.iter().nth(10), Some(&4)); ``` You can get the value at an index. ```rust use rle_vec::RleVec; let v = vec![0, 0, 0, 1, 1, 1, 1, 2, 2, 3]; let mut rle: RleVec<_> = v.into_iter().collect(); rle.set(1, 2); rle.insert(4, 4); let v = rle.iter().cloned().collect::>(); assert_eq!(v, vec![0, 2, 0, 1, 4, 1, 1, 1, 2, 2, 3]); ``` `RleVec::set` and `RleVec::insert` require `T: Clone`. Not all methods implemented on `Vec` are implemented for `RleVec`. All methods returning a slice cannot work for `RleVec`. ## Serialization [Serde](https://serde.rs/) support for serialization is available as a cargo feature. You can specify the feature in the `Cargo.toml` `dependencies` section. ``` [dependencies] rle_vec = { version = "0.4.0", features = ["serialize"] } ``` ## Intended use * Allocate gigantic vectors with a starting value and (randomly) update positions under the assumption the data is going to remain sparse. The update step will be slower than using a `Vec`. * Obviously the savings are bigger when the size of `T` in `RleVec` is large. * If you want to reduce an in memory structure before for example serializing. ## Benchmarks `Cargo bench` can be used to compare the real life difference of get/set/insert/remove operations on a `Vec` and `RleVec`. Warning, some test involves reallocations. ### Creation Create appears to be fastest from slice, unless the data is very sparse. ``` rle_create_1000_runs_of_10_values_from_iter ... bench: 19,561 ns/iter (+/- 342) vec_create_1000_runs_of_10_values_from_iter ... bench: 22,221 ns/iter (+/- 582) rle_create_1000_runs_of_10_values_from_slice ... bench: 6,076 ns/iter (+/- 324) vec_create_1000_runs_of_10_values_from_slice ... bench: 894 ns/iter (+/- 45) rle_create_10_000_equal_values_from_iter ... bench: 15 ns/iter (+/- 1) vec_create_10_000_equal_values_from_iter ... bench: 7,683 ns/iter (+/- 547) rle_create_10_000_equal_values_from_slice ... bench: 4,490 ns/iter (+/- 57) vec_create_10_000_equal_values_from_slice ... bench: 898 ns/iter (+/- 48) rle_create_10_000_unique_values_from_iter ... bench: 25,510 ns/iter (+/- 789) vec_create_10_000_unique_values_from_slice ... bench: 891 ns/iter (+/- 47) rle_create_10_000_unique_values_from_slice ... bench: 25,936 ns/iter (+/- 278) vec_create_10_000_unique_values_from_iter ... bench: 921 ns/iter (+/- 25) ``` ### Access Random access takes the binary search penalty, but iterating performs reasonable. ``` rle_iterate_1000_runs_of_10_values ... bench: 19,773 ns/iter (+/- 481) vec_iterate_1000_runs_of_10_values ... bench: 4,981 ns/iter (+/- 2,171) rle_iterate_10_000_equal_values ... bench: 20,878 ns/iter (+/- 538) vec_iterate_10_000_equal_values ... bench: 5,149 ns/iter (+/- 124) rle_iterate_10_000_unique_values ... bench: 20,130 ns/iter (+/- 340) vec_iterate_10_000_unique_values ... bench: 7,784 ns/iter (+/- 127) rle_random_access_1000_runs_of_10_values ... bench: 34,999 ns/iter (+/- 632) vec_random_access_1000_runs_of_10_values ... bench: 499 ns/iter (+/- 11) ``` ### Insertion Inserting data is competitive and can be faster if no run breaking is required. ``` rle_insert_runmids_breaking_1000_runs_of_10_values ... bench: 308,797 ns/iter (+/- 23,860) rle_insert_runmids_non_breaking_1000_runs_of_10_values ... bench: 171,507 ns/iter (+/- 2,669) vec_insert_runmids_1000_runs_of_10_values ... bench: 191,124 ns/iter (+/- 5,439) ``` ### Set Mutable indexing can have very different outcomes. Minimum cost is the binary search, but depending on the inserted value `Runs` can be merged or split. ``` test rle_set_runmids_breaking_1000_runs_of_10_values ... bench: 177,418 ns/iter (+/- 2,718) test rle_set_runmids_non_breaking_1000_runs_of_10_values ... bench: 34,844 ns/iter (+/- 528) test rle_set_runs_merging_1000_runs ... bench: 97,703 ns/iter (+/- 1,521) test vec_set_runmids_1000_runs_of_10_values ... bench: 908 ns/iter (+/- 11) test vec_set_runs_merging_1000_runs ... bench: 785 ns/iter (+/- 27) ``` ### Deletion Removing values can have very different outcomes. Minimum cost is the binary search, but depending on the removed value `Runs` can be merged or split. But remove is also a lot more expensive for a `Vec`. ``` test rle_remove_runmids_non_breaking_1000_runs_of_10_values ... bench: 184,023 ns/iter (+/- 5,367) test rle_remove_runs_merging_1000_runs ... bench: 182,233 ns/iter (+/- 5,122) test vec_remove_runmids_1000_runs_of_10_values ... bench: 270,981 ns/iter (+/- 6,258) test vec_remove_runs_merging_1000_runs ... bench: 66,948 ns/iter (+/- 1,460) ```