range-reader

Crates.iorange-reader
lib.rsrange-reader
version0.2.0
sourcesrc
created_at2021-11-20 08:00:06.394866
updated_at2022-05-17 06:39:42.332679
descriptionConverts low-level APIs to read ranges of bytes to `Read + Seek`
homepagehttps://github.com/DataEngineeringLabs/ranged-reader-rs
repositoryhttps://github.com/DataEngineeringLabs/ranged-reader-rs
max_upload_size
id484847
size20,216
Jorge Leitao (jorgecarleitao)

documentation

README

Ranged reader

test codecov

Convert low-level APIs to read ranges of files into structs that implement Read + Seek and AsyncRead + AsyncSeek. See parquet_s3_async.rs for an example of this API to read parts of a large parquet file from s3 asynchronously.

Rational

Blob storage https APIs offer the ability to read ranges of bytes from a single blob, i.e. functions of the form

fn read_range_blocking(path: &str, start: usize, length: usize) -> Vec<u8>;
async fn read_range(path: &str, start: usize, length: usize) -> Vec<u8>;

together with its total size,

async fn length(path: &str) -> usize;
fn length(path: &str) -> usize;

These APIs are usually IO-bounded - they wait for network.

Some file formats (e.g. Apache Parquet, Apache Avro, Apache Arrow IPC) allow reading parts of a file for filter and projection push down.

This crate offers 2 structs, RangedReader and RangedStreamer that implement Read + Seek and AsyncRead + AsyncSeek respectively, to bridge the blob storage APIs mentioned above to the traits used by most Rust APIs to read bytes.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Commit count: 10

cargo fmt