# ar_row-rs Row-oriented access to Apache Arrow Currently, it only allows reading arrays, not building them. Arrow is a column-oriented data storage format designed to be stored in memory. While a columnar is very efficient, it can be cumbersome to work with, so this crate provides a work to work on rows by "zipping" columns together into classic Rust structures. This crate was forked from [orcxx](https://crates.io/crates/orcxx), an ORC parsing library, by removing the bindings to the underlying ORC C++ library and rewriting the high-level API to operate on Arrow instead of ORC-specific structures. The `ar_row_derive` crate provides a custom `derive` macro. ```rust extern crate ar_row; extern crate ar_row_derive; extern crate orc_rust; use std::fs::File; use std::num::NonZeroU64; use orc_rust::projection::ProjectionMask; use orc_rust::{ArrowReader, ArrowReaderBuilder}; use ar_row::deserialize::{ArRowDeserialize, ArRowStruct}; use ar_row::row_iterator::RowIterator; use ar_row_derive::ArRowDeserialize; // Define structure #[derive(ArRowDeserialize, Clone, Default, Debug, PartialEq, Eq)] struct Test1 { long1: Option, } // Open file let orc_path = "../test_data/TestOrcFile.test1.orc"; let file = File::open(orc_path).expect("could not open .orc"); let builder = ArrowReaderBuilder::try_new(file).expect("could not make builder"); let projection = ProjectionMask::named_roots( builder.file_metadata().root_data_type(), &["long1"], ); let reader = builder.with_projection(projection).build(); let rows: Vec> = reader .flat_map(|batch| -> Vec> { >::from_record_batch(batch.unwrap()).unwrap() }) .collect(); assert_eq!( rows, vec![ Some(Test1 { long1: Some(9223372036854775807) }), Some(Test1 { long1: Some(9223372036854775807) }) ] ); ``` ## `RowIterator` API This API allows reusing the buffer between record batches, but needs `RecordBatch` instead of `Result` as input. ```rust extern crate ar_row; extern crate ar_row_derive; extern crate orc_rust; use std::fs::File; use std::num::NonZeroU64; use orc_rust::projection::ProjectionMask; use orc_rust::{ArrowReader, ArrowReaderBuilder}; use ar_row::deserialize::{ArRowDeserialize, ArRowStruct}; use ar_row::row_iterator::RowIterator; use ar_row_derive::ArRowDeserialize; // Define structure #[derive(ArRowDeserialize, Clone, Default, Debug, PartialEq, Eq)] struct Test1 { long1: Option, } // Open file let orc_path = "../test_data/TestOrcFile.test1.orc"; let file = File::open(orc_path).expect("could not open .orc"); let builder = ArrowReaderBuilder::try_new(file).expect("could not make builder"); let projection = ProjectionMask::named_roots( builder.file_metadata().root_data_type(), &["long1"], ); let reader = builder.with_projection(projection).build(); let mut rows: Vec> = RowIterator::new(reader.map(|batch| batch.unwrap())) .expect("Could not create iterator") .collect(); assert_eq!( rows, vec![ Some(Test1 { long1: Some(9223372036854775807) }), Some(Test1 { long1: Some(9223372036854775807) }) ] ); ``` ## Nested structures The above two examples also work with nested structures: ```rust extern crate ar_row; extern crate ar_row_derive; use ar_row_derive::ArRowDeserialize; #[derive(ArRowDeserialize, Default, Debug, PartialEq)] struct Test1Option { boolean1: Option, byte1: Option, short1: Option, int1: Option, long1: Option, float1: Option, double1: Option, bytes1: Option>, string1: Option, list: Option>>, } #[derive(ArRowDeserialize, Default, Debug, PartialEq)] struct Test1ItemOption { int1: Option, string1: Option, } ```