| Crates.io | csv_lib |
| lib.rs | csv_lib |
| version | 1.0.6 |
| created_at | 2025-04-28 16:31:37.05053+00 |
| updated_at | 2025-05-15 03:07:36.888225+00 |
| description | Library for parsing CSV files using memory-mapped I/O, with low alloc, and AVX2/NEON support |
| homepage | |
| repository | https://github.com/PTechSoftware/csv_lib |
| max_upload_size | |
| id | 1652339 |
| size | 183,954 |
A high-performance, zero-copy CSV reader for Rust, optimized for extremely fast parsing using:
memmap2)BufReader overhead)memchr3 for full CPU compatibilitycsv_lib Rust Library vs other librariesWe benchmarked the performance of processing 1,000,000,000 CSV rows using several popular Rust libraries. Each result is averaged over 3 independent runs. The test is get &str of each line. In this library we NOT decode the fields, untile you ask for it un the struct field get value, we just get the bytes of the line. For this example we decoded the full row, which is not so efficient.
| π§ͺ Implementation | π§΅ CPU Usage | β±οΈ Average Time |
|---|---|---|
csv crate |
Single-core | 103.272 s |
csv-core crate |
Single-core | 66.767 s |
csv_lib (sync mode) |
Single-core | 58.963 s β |
csv_lib (parallel mode) |
Multi-core (full) | 37.936 s π |
csv_lib?csv and csv-core even in single-threaded mode.memmap2 and SIMD (AVX2 / NEON) for fast parsing directly from memory.π‘
csv_libis optimized for sequential, chunked, and parallel processing using memory-mapped I/O and customizable parsing logic.
Put in your terminal
cargo add csv_lib
If you also want FFI support:
cargo add csv_lib --features ffi
In your project folder, at the same level src create a .cargo/config.toml file with the following content:
[build]
rustflags = ["-C", "target-cpu=native"]
We use Row and Field struct, to handle the navigation in the document.
examples folder.pub fn main(){
//Create Config
let cfg = CsvConfig::new(
b',',
0u8,
b'\n',
Encoding::Windows1252,
false
);
//Open the file
let f = match CsvReaderWithMap::open("data.csv", &cfg) {
Ok(f) => f,
Err(e) => panic!("{}", e)
};
// We extract different' s country's of the dataset :
// For example:
//Create a Hash Acumulator
let mut cities :HashSet<String>= HashSet::with_capacity(195);
//Iter over rows
while let Some(mut row) = f.next_raw() {
//Extract Field index 6 starting on 0
let city = row.get_index(6 );
// Decode bytes as &str
let name = city.get_utf8_as_str();
//Check and accumulate
if !cities.contains(name){
cities.insert(name.to_string());
}
}
}
pub fn main(){
//Create Config
let cfg = CsvConfig::new(
b',',
0u8,
b'\n',
Encoding::Windows1252,
false
);
//Open the file
let f = match CsvReaderWithMap::open("data.csv", &cfg) {
Ok(f) => f,
Err(e) => panic!("{}", e)
};
//Get Slice Reference
let data = f.get_slice();
//Create a shared counter
let shared = Shared::<i32>::default();
//Create de clousere executed on each thread (the ARC Mutex type must be the same as Shared)
let closure = |_: &mut RowParallel<'_>, target: Arc<Mutex<i32>>| {
//Do some stuff
// ...
//Access editable variable.(Use after process due it blocks). Omit this lock uf you can.
let mut lock = target.lock().unwrap();
*lock += 1;
};
//Execute parallel process
parallel_processing_csv(
data,
b'\n',
b',',
0u8,
false,
closure,
shared.arc(),
);
println!("Iterated Lines: {}", shared.lock())
}
data_1000000000.txt (approx. 14GB)| OS | Arch | CPU/Chipset | Type | Sync Avg | Multi-Core (Lock) Avg | Multi-Core Avg |
|---|---|---|---|---|---|---|
| Windows | x86_64 | i9-12900KF [Desktop] | Execution | 58,819 ms (58.82 s) | 191,619 ms (191.62 s) | 39,581 ms (39.58 s) |
| Windows | x86_64 | i7-12650H [Notebook] | Execution | 77,463 ms (77.46 s) | 216,394 ms (216.39 s) | 52,459 ms (52.46 s) |
| macOS | aarch64 | Apple M2 2022 [Notebook] | Execution | 76,337 ms (76.34 s) | 120,968 ms (120.97 s) | 73,739 ms (73.74 s) |
Check it here
release mode. It have a huge difference due the trash lines of code cargo generates in debug profile, and the time of process is awfulforce_memcach3 = false to take advantage of SIMD (AVX2 or NEON).delimiter, line_break, and string_separator properly to the file format.&[u8] slices).The reached performance was possible due this 3 crates
Made by Ignacio PΓ©rez Panizza πΊπΎ π§