Crates.io | bytes-cast |
lib.rs | bytes-cast |
version | 0.3.0 |
source | src |
created_at | 2021-01-15 13:36:33.734205 |
updated_at | 2023-01-10 09:38:27.430513 |
description | Safely re-interpreting &[u8] bytes as custom structs without copying, for efficiently reading structured binary data. |
homepage | |
repository | https://foss.heptapod.net/octobus/rust/bytes-cast |
max_upload_size | |
id | 342383 |
size | 27,643 |
bytes-cast
Safely re-interpreting &[u8]
bytes as custom structs without copying,
for efficiently reading structured binary data.
This crate contains code derived from https://github.com/Lokathor/bytemuck.
When reading from disk a file in a given format, “traditional” parsing techniques such
with the nom
crate typically involve creating a different data structure in memory
where allocation and copying can be costly.
For binary formats amenable to this it can be more efficient to have in memory a bytes buffer in the same format as on disk, possibly memory-mapped directly by the kernel, and only access parts of it as needed. But doing this entierly with manual index or pointer manipulation can be error-prone.
By defining struct
s whose memory layout matches the binary format
then casting pointers to manipulate reference, arrays, or slices of those structs
we can let the compiler do most of the offset computations and have much more readable code.
Some Rust types have validity constraints and must not be cast from arbitrary bytes.
For example creating a bool
whose value in memory is not 0_u8
or 1_u8
is Undefined Behavior.
Similarly for enum
s.
When align_of
for a type is greater than one,
accessing values of that type at addresses not a multiple of alignment is Undefined Behavior.
Alignment can also cause struct to have padding, making field offsets not what we might expect.
Instead, we can make helper types that wrap for example [u8; 4]
and convert to/from u32
.
Binary formats for storage or transmission typically mandate one of little-endian or big-endian. Helper types again can take care of conversion to and from the CPU’s native endianness.
By default the Rust compiler can choose reorder struct fields (in order to reduce padding).
This again can make field offsets not what we’d expect.
This can be disabled by marking a struct with #[repr(C)]
or #[repr(transparent)]
.
This crate combines Rust’s check for all of the above at compile-time. The the documentation for API details.
bytemuck
and other projects already exist with very similar goals.
This crate make some different design choices and is opinionated in some ways:
It only converts from &[u8]
bytes
and does not try to be more general or accomodate many use cases.
Providing more bytes than necessary is not an error.
Instead the start of the slice is re-interpreted,
and the remaining bytes are part of the return value for further processing.
(The caller can check or assert remaining.is_empty()
if an exact length is desired.)
It mandates align_of() == 1
at compile-time instead of checking pointer alignment at runtime,
removing one category of panics or errors that needs to be handled.
Not enough bytes is the only error case.
Fields with align_of() == 1
also removes any padding in structs.