Crates.io | shardio |
lib.rs | shardio |
version | 0.8.2 |
source | src |
created_at | 2019-06-28 14:03:52.931916 |
updated_at | 2021-11-08 17:06:44.454929 |
description | Out-of-memory sorting and streaming of large datasets |
homepage | |
repository | https://github.com/10XGenomics/rust-shardio |
max_upload_size | |
id | 144312 |
size | 88,088 |
Library for out-of-memory sorting of large datasets which need to be processed in multiple map / sort / reduce passes.
You write a stream of items of type T
implementing Serialize
and Deserialize
to a ShardWriter
. The items are buffered, sorted according to a customizable sort key, then serialized to disk in chunks with serde + lz4, while maintaining an index of the position and key range of each chunk. You use a ShardReader
to stream through a item in a selected interval of the key space, in sorted order.
See Docs for API and examples.
Note: Enable the 'full-test' feature in Release mode to turn on some long-running stress tests.