shardio

Crates.ioshardio
lib.rsshardio
version0.8.2
sourcesrc
created_at2019-06-28 14:03:52.931916
updated_at2021-11-08 17:06:44.454929
descriptionOut-of-memory sorting and streaming of large datasets
homepage
repositoryhttps://github.com/10XGenomics/rust-shardio
max_upload_size
id144312
size88,088
crates_io (github:10xgenomics:crates_io)

documentation

https://10xgenomics.github.io/rust-shardio

README

rust-shardio

Crates.io Downloads Crates.io Version Crates.io License Build Status Coverage Status API Docs

Library for out-of-memory sorting of large datasets which need to be processed in multiple map / sort / reduce passes.

You write a stream of items of type T implementing Serialize and Deserialize to a ShardWriter. The items are buffered, sorted according to a customizable sort key, then serialized to disk in chunks with serde + lz4, while maintaining an index of the position and key range of each chunk. You use a ShardReader to stream through a item in a selected interval of the key space, in sorted order.

See Docs for API and examples.

Note: Enable the 'full-test' feature in Release mode to turn on some long-running stress tests.

Commit count: 164

cargo fmt