Crates.io | borsh-v |
lib.rs | borsh-v |
version | 0.7.3 |
source | src |
created_at | 2020-10-31 09:03:14.881599 |
updated_at | 2020-10-31 09:07:26.958019 |
description | Binary Object Representation Serializer for Hashing |
homepage | http://borsh.io |
repository | https://github.com/nearprotocol/borsh |
max_upload_size | |
id | 307199 |
size | 85,833 |
borsh
Binary Object Representation Serializer for Hashing
Why do we need yet another serialization format? Borsh is the first serializer that prioritizes the following qualities that are crucial for security-critical projects:
unsafe
to avoid an exhaustion attack;use borsh::{BorshSerialize, BorshDeserialize};
#[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
struct A {
x: u64,
y: String,
}
#[test]
fn test_simple_struct() {
let a = A {
x: 3301,
y: "liber primus".to_string(),
};
let encoded_a = a.try_to_vec().unwrap();
let decoded_a = A::try_from_slice(&encoded_a).unwrap();
assert_eq!(a, decoded_a);
}
Opting out from Serde allows borsh to have some features that currently are not available for serde-compatible serializers.
Currently we support two features: borsh_init
and borsh_skip
(the former one not available in Serde).
borsh_init
allows to automatically run an initialization function right after deserialization. This adds a lot of convenience for objects that are architectured to be used as strictly immutable. Usage example:
#[derive(BorshSerialize, BorshDeserialize)]
#[borsh_init(init)]
struct Message {
message: String,
timestamp: u64,
public_key: CryptoKey,
signature: CryptoSignature
hash: CryptoHash
}
impl Message {
pub fn init(&mut self) {
self.hash = CryptoHash::new().write_string(self.message).write_u64(self.timestamp);
self.signature.verify(self.hash, self.public_key);
}
}
borsh_skip
allows to skip serializing/deserializing fields, assuming they implement Default
trait, similary to #[serde(skip)]
.
#[derive(BorshSerialize, BorshDeserialize)]
struct A {
x: u64,
#[borsh_skip]
y: f32,
}
We measured the following benchmarks on objects that blockchain projects care about the most: blocks, block headers, transactions, accounts. We took object structure from the nearprotocol blockchain. We used Criterion for building the following graphs.
The benchmarks were run on Google Cloud n1-standard-2 (2 vCPUs, 7.5 GB memory).
Block header serialization speed vs block header size in bytes (size only roughly corresponds to the serialization complexity which causes non-smoothness of the graph):
Block header de-serialization speed vs block header size in bytes:
Block serialization speed vs block size in bytes:
Block de-serialization speed vs block size in bytes:
See complete report here.
In short, Borsh is a non self-describing binary serialization format. It is designed to serialize any objects to canonical and deterministic set of bytes.
General principles:
u32
;u8
for the enum ordinal and then storing data inside the enum value (if present).Formal specification:
Informal type | Rust EBNF * | Pseudocode |
Integers | integer_type: ["u8" | "u16" | "u32" | "u64" | "u128" | "i8" | "i16" | "i32" | "i64" | "i128" ] | little_endian(x) |
Floats | float_type: ["f32" | "f64" ] | err_if_nan(x) little_endian(x as integer_type) |
Unit | unit_type: "()" | We do not write anything |
Fixed sized arrays | array_type: '[' ident ';' literal ']' | for el in x repr(el as ident) |
Dynamic sized array | vec_type: "Vec<" ident '>' | repr(len() as u32) for el in x repr(el as ident) |
Struct | struct_type: "struct" ident fields | repr(fields) |
Fields | fields: [named_fields | unnamed_fields] | |
Named fields | named_fields: '{' ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... '}' | repr(ident_field0 as ident_type0) repr(ident_field1 as ident_type1) ... |
Unnamed fields | unnamed_fields: '(' ident_type0 ',' ident_type1 ',' ... ')' | repr(x.0 as type0) repr(x.1 as type1) ... |
Enum | enum: 'enum' ident '{' variant0 ',' variant1 ',' ... '}' variant: ident [ fields ] ? |
Suppose X is the number of the variant that the enum takes. repr(X as u8) repr(x.X as fieldsX) |
HashMap | hashmap: "HashMap<" ident0, ident1 ">" |
repr(x.len() as u32) for (k, v) in x.sorted_by_key() { repr(k as ident0) repr(v as ident1) } |
HashSet | hashset: "HashSet<" ident ">" |
repr(x.len() as u32) for el in x.sorted() { repr(el as ident) } |
Option | option_type: "Option<" ident '>' | if x.is_some() { repr(1 as u8) repr(x.unwrap() as ident) } else { repr(0 as u8) } |
String | string_type: "String" | encoded = utf8_encoding(x) as Vec<u8> repr(encoded.len() as u32) repr(encoded as Vec<u8>) |
Note:
[ ident_field ':' ident_type ',' ] *
we define them as ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ...
so that we can refer to individual elements in the pseudocode;repr()
function to denote that we are writing the representation of the given element into an imaginary buffer.After you merged your change into the master branch and bumped the versions of all three crates it is time to officially release the new version.
Make sure borsh
, borsh-derive
and borsh-derive-internal
all have the new crate versions. Then navigate to each folder and run (in the given order):
cd ../borsh-derive-internal; cargo publish
cd ../borsh-derive; cargo publish
cd ../borsh; cargo publish
Make sure you are on the master branch, then tag the code and push the tag:
git tag -a v9.9.9 -m "My superawesome change."
git push origin v9.9.9