# Column # `column` is a Rust library to represent vectors of a certain type in a columnar format. The columnar representation is beneficial when iterating a large number of elements but only looking at a subset of a type's fields. ## License Licensed under either of * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0) * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT) at your option. ### Contribution Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions. # An example To use `column`, add the following dependency to your project's `Cargo.toml`: ```toml [dependencies] column = { git = "https://github.com/antiguru/column-rs.git" } ``` This will bring in the `column` crate from Github (this will hopefully change!), which should allow you to use regular structs in a column-based memory layout. ```rust extern crate column; #[macro_use] extern crate column_derive; #[derive(Column, Debug)] struct Data { id: usize, val: f64, } fn main() { let mut u = ::new(); let ds = vec![Data { id: 0, val: 3.141 }, Data { id: 1, val: 42.}]; u.extend(ds); for e in u.iter() { println!("Element: {:?}", e); } for mut e in u.iter_mut() { *e.val *= 2.; } for e in u.iter() { println!("Element: {:?}", e); } } ``` This example is contained in the `column` crate, you can run it from the crate's root directory by typing ``` % cargo run --example columnar Running `target/debug/examples/columnar` Element: DataRef { id: 0, val: 3.141 } Element: DataRef { id: 1, val: 42 } Element: DataRef { id: 0, val: 6.282 } Element: DataRef { id: 1, val: 84 } ``` # Generic types The generated code dereferences values to materialize elements for `to_owned`. That means that the value has to implement the `Copy` trait for this to work. Otherwise, Rust will complain about it. For example, this will produce a working generic columnar type: ```rust #[derive(Column)] struct DataGen { id: A, } ``` # Filtered collections When using columnar types, they might be passed to different downstream functionality without exposing all elements in the collection. To avoid intermediate copies, this carte contains a filtered collection (`FilteredCollection`.) It is a wrapper around an `&IntoIterator` combined with a `Vec` that stores which items are available in the target collection. The following example instantiates a `FilteredCollection` and uses its `retain` method to only retain a subset of elements in the collection. Note that this does not change the underlying data. ```rust use column::bitmap::FilteredCollection; let mut bitmap_container = FilteredCollection::new(&container, container.len()); bitmap_container.retain(|u| p(u)); ``` # Debugging Column creates the required implementations during the compilation process. Sometimes things go wrong and debugging the generated code is rather tedious. For this reason, a flag `verbose` exists. It forces `column` to write the intermediate code to the target directory. For a type `Data`, it will generate the file `target/derive_column_Data.rs`. (If there's a more elegant way, please open an issue or send a pull request!) Insert the following snippet in `Cargo.toml` to enable verbose output: ```toml [dependencies] column = { git = "https://github.com/antiguru/column-rs.git", features = [ "verbose" ] } ``` The feature can also be activated on the command line when working on this project. The following `cargo` invocation tests and dumps the intermediate files: ``` cargo test --features verbose ``` # Performance There's a small benchmark. It shows that the columnar format can be substantially faster for certain operations. Run it using Rust nightly using something like this: ``` % rustup run nightly cargo bench [...] running 6 tests test data_bitmap_column_add_assign ... bench: 13,178,990 ns/iter (+/- 405,043) = 1909 MB/s test data_bitmap_vec_add_assign ... bench: 39,891,687 ns/iter (+/- 714,537) = 630 MB/s test data_column ... bench: 1,949,104 ns/iter (+/- 79,882) = 4303 MB/s test data_column_add_assign ... bench: 5,495,641 ns/iter (+/- 203,841) = 4579 MB/s test data_row ... bench: 16,817,910 ns/iter (+/- 264,722) = 498 MB/s test data_row_add_assign ... bench: 32,581,004 ns/iter (+/- 759,734) = 772 MB/s test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out ``` Take the performance numbers with a grain of salt. The speedup from using a columnar representation originates from loading less and more dense data into memory. It will only show any benefit of not all elements of a struct are accessed in a tight loop, because only then we can reduce the number of bytes transferred into the CPU. However, when all data is touched, i.e. all columns have to be loaded, the speedup may be negligible or even negative. Also, there is a cost associated with transforming a row-based representation into a column representation.