read_chunk_iter

Crates.io	read_chunk_iter
lib.rs	read_chunk_iter
version	0.2.0
created_at	2023-07-30 23:09:11.96124+00
updated_at	2024-01-21 03:37:17.421535+00
description	Iterator adapters over a reader that yield fixed-size chunks at a time.
homepage
repository	https://github.com/rlee287/read_chunk_iter
max_upload_size
id	930345
size	115,106

Ryan Lee (rlee287)

documentation

README

read_chunk_iter

Iterator adapters over a reader that yields fixed-size chunks at a time.

Why not use generic iterator composition over Read objects?

A simple solution would be to use iterator adapters over the bytes of a file, e.g. &BufReader::new(file_path).bytes().chunks(CHUNK_SIZE), using the itertools crate to provide the chunks adapter. However, using generic iterator adaptors is significantly slower than a dedicated iterator, with the timing comparison in the examples folder demonstrating a slowdown by a factor of 2.5-40.5. (This is including the use of BufReader to reduce the number of underlying read calls. Without such buffering, bytes() would call read once for each byte, resulting in a much larger slowdown.)

This crate offers two alternatives:

ChunkedReaderIter, which synchronously reads from the underlying Read object and yields chunks of data when requested.
ThreadedChunkedReaderIter, which performs the reads in a separate thread and transmits chunks of data to the originating thread.

Whether to use ChunkedReaderIter or ThreadedChunkedReaderIter depends on whether the saved time of asynchronous reads while doing other computations outweighs the overhead of threading. Benchmark your particular use case before assuming that one is necessarily better than the other.

Features

autodetect_vectored: Enable automatic detection of whether vectored reads offer speedups, and take advantage of them when they offer speedups. This feature requires nightly, but manual selection of vectored reads is still possible without it.

Planned features

fadvise: Use posix_fadvise to signal POSIX_FADV_SEQUENTIAL for the whole file and to provide the option to free filesystem cache with POSIX_FADV_DONTNEED on yielded data. This feature will be enabled by default but will be a no-op on non-Unix systems.

Commit count: 69

read_chunk_iter

documentation

README

read_chunk_iter

Why not use generic iterator composition over Read objects?

Features

Planned features

cargo fmt