zerovec

Crates.iozerovec
lib.rszerovec
version0.10.4
sourcesrc
created_at2021-04-19 18:57:13.970932
updated_at2024-06-27 17:17:17.434294
descriptionZero-copy vector backed by a byte array
homepage
repositoryhttps://github.com/unicode-org/icu4x
max_upload_size
id386783
size622,319
icu4x-release (github:unicode-org:icu4x-release)

documentation

README

zerovec crates.io

Zero-copy vector abstractions for arbitrary types, backed by byte slices.

zerovec enables a far wider range of types — beyond just &[u8] and &str — to participate in zero-copy deserialization from byte slices. It is serde compatible and comes equipped with proc macros

Clients upgrading to zerovec benefit from zero heap allocations when deserializing read-only data.

This crate has four main types:

The first two are intended as close-to-drop-in replacements for Vec<T> in Serde structs. The third and fourth are intended as a replacement for HashMap or LiteMap. When used with Serde derives, be sure to apply #[serde(borrow)] to these types, same as one would for Cow<'a, T>.

ZeroVec<'a, T>, VarZeroVec<'a, T>, ZeroMap<'a, K, V>, and ZeroMap2d<'a, K0, K1, V> all behave like Cow<'a, T> in that they abstract over either borrowed or owned data. When performing deserialization from human-readable formats (like json and xml), typically these types will allocate and fully own their data, whereas if deserializing from binary formats like bincode and postcard, these types will borrow data directly from the buffer being deserialized from, avoiding allocations and only performing validity checks. As such, this crate can be pretty fast (see below for more information) on deserialization.

See the design doc for details on how this crate works under the hood.

Cargo features

This crate has several optional Cargo features:

  • serde: Allows serializing and deserializing zerovec's abstractions via serde
  • yoke: Enables implementations of Yokeable from the yoke crate, which is also useful in situations involving a lot of zero-copy deserialization.
  • derive: Makes it easier to use custom types in these collections by providing the #[make_ule] and #[make_varule] proc macros, which generate appropriate ULE and VarULE-conformant types for a given "normal" type.
  • std: Enabled std::Error implementations for error types. This crate is by default no_std with a dependency on alloc.

Examples

Serialize and deserialize a struct with ZeroVec and VarZeroVec with Bincode:

use zerovec::{VarZeroVec, ZeroVec};

// This example requires the "serde" feature
#[derive(serde::Serialize, serde::Deserialize)]
pub struct DataStruct<'data> {
    #[serde(borrow)]
    nums: ZeroVec<'data, u32>,
    #[serde(borrow)]
    chars: ZeroVec<'data, char>,
    #[serde(borrow)]
    strs: VarZeroVec<'data, str>,
}

let data = DataStruct {
    nums: ZeroVec::from_slice_or_alloc(&[211, 281, 421, 461]),
    chars: ZeroVec::alloc_from_slice(&['ö', '冇', 'म']),
    strs: VarZeroVec::from(&["hello", "world"]),
};
let bincode_bytes =
    bincode::serialize(&data).expect("Serialization should be successful");
assert_eq!(bincode_bytes.len(), 67);

let deserialized: DataStruct = bincode::deserialize(&bincode_bytes)
    .expect("Deserialization should be successful");
assert_eq!(deserialized.nums.first(), Some(211));
assert_eq!(deserialized.chars.get(1), Some('冇'));
assert_eq!(deserialized.strs.get(1), Some("world"));
// The deserialization will not have allocated anything
assert!(!deserialized.nums.is_owned());

Use custom types inside of ZeroVec:

use zerovec::{ZeroVec, VarZeroVec, ZeroMap};
use std::borrow::Cow;
use zerovec::ule::encode_varule_to_box;

// custom fixed-size ULE type for ZeroVec
#[zerovec::make_ule(DateULE)]
#[derive(Copy, Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
struct Date {
    y: u64,
    m: u8,
    d: u8
}

// custom variable sized VarULE type for VarZeroVec
#[zerovec::make_varule(PersonULE)]
#[zerovec::derive(Serialize, Deserialize)] // add Serde impls to PersonULE
#[derive(Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
struct Person<'a> {
    birthday: Date,
    favorite_character: char,
    #[serde(borrow)]
    name: Cow<'a, str>,
}

#[derive(serde::Serialize, serde::Deserialize)]
struct Data<'a> {
    #[serde(borrow)]
    important_dates: ZeroVec<'a, Date>,
    // note: VarZeroVec always must reference the ULE type directly
    #[serde(borrow)]
    important_people: VarZeroVec<'a, PersonULE>,
    #[serde(borrow)]
    birthdays_to_people: ZeroMap<'a, Date, PersonULE>
}


let person1 = Person {
    birthday: Date { y: 1990, m: 9, d: 7},
    favorite_character: 'π',
    name: Cow::from("Kate")
};
let person2 = Person {
    birthday: Date { y: 1960, m: 5, d: 25},
    favorite_character: '冇',
    name: Cow::from("Jesse")
};

let important_dates = ZeroVec::alloc_from_slice(&[Date { y: 1943, m: 3, d: 20}, Date { y: 1976, m: 8, d: 2}, Date { y: 1998, m: 2, d: 15}]);
let important_people = VarZeroVec::from(&[&person1, &person2]);
let mut birthdays_to_people: ZeroMap<Date, PersonULE> = ZeroMap::new();
// `.insert_var_v()` is slightly more convenient over `.insert()` for custom ULE types
birthdays_to_people.insert_var_v(&person1.birthday, &person1);
birthdays_to_people.insert_var_v(&person2.birthday, &person2);

let data = Data { important_dates, important_people, birthdays_to_people };

let bincode_bytes = bincode::serialize(&data)
    .expect("Serialization should be successful");
assert_eq!(bincode_bytes.len(), 168);

let deserialized: Data = bincode::deserialize(&bincode_bytes)
    .expect("Deserialization should be successful");

assert_eq!(deserialized.important_dates.get(0).unwrap().y, 1943);
assert_eq!(&deserialized.important_people.get(1).unwrap().name, "Jesse");
assert_eq!(&deserialized.important_people.get(0).unwrap().name, "Kate");
assert_eq!(&deserialized.birthdays_to_people.get(&person1.birthday).unwrap().name, "Kate");

} // feature = serde and derive

Performance

zerovec is designed for fast deserialization from byte buffers with zero memory allocations while minimizing performance regressions for common vector operations.

Benchmark results on x86_64:

Operation Vec<T> zerovec
Deserialize vec of 100 u32 233.18 ns 14.120 ns
Compute sum of vec of 100 u32 (read every element) 8.7472 ns 10.775 ns
Binary search vec of 1000 u32 50 times 442.80 ns 472.51 ns
Deserialize vec of 100 strings 7.3740 μs* 1.4495 μs
Count chars in vec of 100 strings (read every element) 747.50 ns 955.28 ns
Binary search vec of 500 strings 10 times 466.09 ns 790.33 ns

* This result is reported for Vec<String>. However, Serde also supports deserializing to the partially-zero-copy Vec<&str>; this gives 1.8420 μs, much faster than Vec<String> but a bit slower than zerovec.

Operation HashMap<K,V> LiteMap<K,V> ZeroMap<K,V>
Deserialize a small map 2.72 μs 1.28 μs 480 ns
Deserialize a large map 50.5 ms 18.3 ms 3.74 ms
Look up from a small deserialized map 49 ns 42 ns 54 ns
Look up from a large deserialized map 51 ns 155 ns 213 ns

Small = 16 elements, large = 131,072 elements. Maps contain <String, String>.

The benches used to generate the above table can be found in the benches directory in the project repository. zeromap benches are named by convention, e.g. zeromap/deserialize/small, zeromap/lookup/large. The type is appended for baseline comparisons, e.g. zeromap/lookup/small/hashmap.

More Information

For more information on development, authorship, contributing etc. please visit ICU4X home page.

Commit count: 3247

cargo fmt