Crates.io | shard-csv |
lib.rs | shard-csv |
version | 0.1.0 |
source | src |
created_at | 2021-11-16 05:25:40.112459 |
updated_at | 2021-11-16 05:25:40.112459 |
description | A library to aid in splitting CSV/TSV files into multiple disjoint files. |
homepage | https://github.com/aeshirey/shard-csv/ |
repository | https://github.com/aeshirey/shard-csv/ |
max_upload_size | |
id | 482497 |
size | 37,695 |
shard-csv
is a crate to split input CSV files into output shards according to some key selector. Use it when you have some large dataset that you want to split out with more control than, say, GNU split.
Include it in your Cargo.toml with: shard-csv = "0.1.0"
.
Sample usage first entails creating a CSV reader. Note that shard-csv
depends heavily upon the csv
crate, which it, in fact, uses and re-exports:
let mut reader = shard_csv::csv::ReaderBuilder::new()
.from_path("input_data.csv")
.expect("Failed to create reader from file");
Then you can create a sharded CSV writer that:
let mut writer = ShardedWriterBuilder::new_from_csv_reader(&mut reader)
.expect("Failed to create writer")
.with_key_selector(|row| row.get(2).unwrap_or("unknown").to_string())
.with_output_shard_naming(|key, seq| format!("data.part{}.csv", key, seq))
.with_output_splitting(FileSplitting::SplitAfterBytes(1024 * 1024))
.on_file_completion(|path, key| {
println!("The file {} is now ready for shard {}", path.display(), key);
// Do something more with the completed file if you want.
});
writer.process_csv(&mut reader).ok();