| Crates.io | shardio |
| lib.rs | shardio |
| version | 0.8.2 |
| created_at | 2019-06-28 14:03:52.931916+00 |
| updated_at | 2021-11-08 17:06:44.454929+00 |
| description | Out-of-memory sorting and streaming of large datasets |
| homepage | |
| repository | https://github.com/10XGenomics/rust-shardio |
| max_upload_size | |
| id | 144312 |
| size | 88,088 |
Library for out-of-memory sorting of large datasets which need to be processed in multiple map / sort / reduce passes.
You write a stream of items of type T implementing Serialize and Deserialize to a ShardWriter. The items are buffered, sorted according to a customizable sort key, then serialized to disk in chunks with serde + lz4, while maintaining an index of the position and key range of each chunk. You use a ShardReader to stream through a item in a selected interval of the key space, in sorted order.
See Docs for API and examples.
Note: Enable the 'full-test' feature in Release mode to turn on some long-running stress tests.