Crates.io | read_chunk_iter |
lib.rs | read_chunk_iter |
version | 0.2.0 |
source | src |
created_at | 2023-07-30 23:09:11.96124 |
updated_at | 2024-01-21 03:37:17.421535 |
description | Iterator adapters over a reader that yield fixed-size chunks at a time. |
homepage | |
repository | https://github.com/rlee287/read_chunk_iter |
max_upload_size | |
id | 930345 |
size | 115,106 |
Iterator adapters over a reader that yields fixed-size chunks at a time.
A simple solution would be to use iterator adapters over the bytes of a file, e.g. &BufReader::new(file_path).bytes().chunks(CHUNK_SIZE)
, using the itertools
crate to provide the chunks
adapter. However, using generic iterator adaptors is significantly slower than a dedicated iterator, with the timing comparison in the examples
folder demonstrating a slowdown by a factor of 2.5-40.5. (This is including the use of BufReader
to reduce the number of underlying read
calls. Without such buffering, bytes()
would call read
once for each byte, resulting in a much larger slowdown.)
This crate offers two alternatives:
ChunkedReaderIter
, which synchronously reads from the underlying Read
object and yields chunks of data when requested.ThreadedChunkedReaderIter
, which performs the reads in a separate thread and transmits chunks of data to the originating thread.Whether to use ChunkedReaderIter
or ThreadedChunkedReaderIter
depends on whether the saved time of asynchronous reads while doing other computations outweighs the overhead of threading. Benchmark your particular use case before assuming that one is necessarily better than the other.
posix_fadvise
to signal POSIX_FADV_SEQUENTIAL
for the whole file and to provide the option to free filesystem cache with POSIX_FADV_DONTNEED
on yielded data. This feature will be enabled by default but will be a no-op on non-Unix systems.