oasis-borsh

Crates.iooasis-borsh
lib.rsoasis-borsh
version0.2.12
sourcesrc
created_at2020-01-15 19:24:27.804471
updated_at2021-01-19 22:29:10.532178
descriptionBinary Object Representation Serializer for Hashing
homepagehttp://borsh.io
repositoryhttps://github.com/nearprotocol/borsh
max_upload_size
id198816
size38,094
Nick Hynes (nhynes)

documentation

README

borsh

Binary Object Representation Serializer for Hashing

npm npm Crates.io version Download Join the community on Discord Apache 2.0 License Travis Build

Website | Example | Features | Benchmarks | Specification | Releasing

Why do we need yet another serialization format? Borsh is the first serializer that prioritizes the following qualities that are crucial for security-critical projects:

  • Consistent and specified binary representation:
    • Consistent means there is a bijective mapping between objects and their binary representations. There is no two binary representations that deserialize into the same object. This is extremely useful for applications that use binary representation to compute hash;
    • Borsh comes with a full specification that can be used for implementations in other languages;
  • Safe. Borsh implementations use safe coding practices. In Rust, Borsh uses almost only safe code, with one exception usage of unsafe to avoid an exhaustion attack;
  • Speed. In Rust, Borsh achieves high performance by opting out from Serde which makes it faster than bincode in some cases; which also reduces the code size.

Example

use oasis_borsh::{BorshSerialize, BorshDeserialize};

#[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]
struct A {
    x: u64,
    y: String,
}

#[test]
fn test_simple_struct() {
    let a = A {
        x: 3301,
        y: "liber primus".to_string(),
    };
    let encoded_a = a.try_to_vec().unwrap();
    let decoded_a = A::try_from_slice(&encoded_a).unwrap();
    assert_eq!(a, decoded_a);
}

Features

Opting out from Serde allows borsh to have some features that currently are not available for serde-compatible serializers. Currently we support two features: borsh_init and borsh_skip (the former one not available in Serde).

borsh_init allows to automatically run an initialization function right after deserialization. This adds a lot of convenience for objects that are architectured to be used as strictly immutable. Usage example:

#[derive(BorshSerialize, BorshDeserialize)]
#[borsh_init(init)]
struct Message {
    message: String,
    timestamp: u64,
    public_key: CryptoKey,
    signature: CryptoSignature
    hash: CryptoHash
}

impl Message {
    pub fn init(&mut self) {
        self.hash = CryptoHash::new().write_string(self.message).write_u64(self.timestamp);
        self.signature.verify(self.hash, self.public_key);
    }
}

borsh_skip allows to skip serializing/deserializing fields, assuming they implement Default trait, similary to #[serde(skip)].

#[derive(BorshSerialize, BorshDeserialize)]
struct A {
    x: u64,
    #[borsh_skip]
    y: f32,
}

Benchmarks

We measured the following benchmarks on objects that blockchain projects care about the most: blocks, block headers, transactions, accounts. We took object structure from the nearprotocol blockchain. We used Criterion for building the following graphs.

The benchmarks were run on Google Cloud n1-standard-2 (2 vCPUs, 7.5 GB memory).

Block header serialization speed vs block header size in bytes (size only roughly corresponds to the serialization complexity which causes non-smoothness of the graph):

ser_header

Block header de-serialization speed vs block header size in bytes:

ser_header

Block serialization speed vs block size in bytes:

ser_header

Block de-serialization speed vs block size in bytes:

ser_header

See complete report here.

Specification

In short, Borsh is a non self-describing binary serialization format. It is designed to serialize any objects to canonical and deterministic set of bytes.

General principles:

  • integers are little endian;
  • sizes of dynamic containers are written before values as u32;
  • all unordered containers (hashmap/hashset) are ordered in lexicographic order by key (in tie breaker case on value);
  • structs are serialized in the order of fields in the struct;
  • enums are serialized with using u8 for the enum ordinal and then storing data inside the enum value (if present).

Formal specification:

Informal typeRust EBNF * Pseudocode
Integers integer_type: ["u8" | "u16" | "u32" | "u64" | "u128" | "i8" | "i16" | "i32" | "i64" | "i128" ] little_endian(x)
Floats float_type: ["f32" | "f64" ] err_if_nan(x)
little_endian(x as integer_type)
Unit unit_type: "()"We do not write anything
Fixed sized arrays array_type: '[' ident ';' literal ']' for el in x
  repr(el as ident)
Dynamic sized array vec_type: "Vec<" ident '>' repr(len() as u32)
for el in x
  repr(el as ident)
Struct struct_type: "struct" ident fields repr(fields)
Fields fields: [named_fields | unnamed_fields]
Named fields named_fields: '{' ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... '}' repr(ident_field0 as ident_type0)
repr(ident_field1 as ident_type1)
...
Unnamed fields unnamed_fields: '(' ident_type0 ',' ident_type1 ',' ... ')'repr(x.0 as type0)
repr(x.1 as type1)
...
Enum enum: 'enum' ident '{' variant0 ',' variant1 ',' ... '}'
variant: ident [ fields ] ?
Suppose X is the number of the variant that the enum takes.
repr(X as u8)
repr(x.X as fieldsX)
HashMaphashmap: "HashMap<" ident0, ident1 ">" repr(x.len() as u32)
for (k, v) in x.sorted_by_key() {
  repr(k as ident0)
  repr(v as ident1)
}
HashSethashset: "HashSet<" ident ">" repr(x.len() as u32)
for el in x.sorted() {
 repr(el as ident)
}
Option option_type: "Option<" ident '>' if x.is_some() {
  repr(1 as u8)
  repr(x.unwrap() as ident)
} else {
  repr(0 as u8)
}
String string_type: "String" encoded = utf8_encoding(x) as Vec<u8>
repr(encoded.len() as u32)
repr(encoded as Vec<u8>)

Note:

  • Some parts of Rust grammar are not yet formalized, like enums and variants. We backwards derive EBNF forms of Rust grammar from syn types;
  • We had to extend repetitions of EBNF and instead of defining them as [ ident_field ':' ident_type ',' ] * we define them as ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... so that we can refer to individual elements in the pseudocode;
  • We use repr() function to denote that we are writing the representation of the given element into an imaginary buffer.

Releasing

After you merged your change into the master branch and bumped the versions of all three crates it is time to officially release the new version.

Make sure borsh, borsh-derive and borsh-derive-internal all have the new crate versions. Then navigate to each folder and run (in the given order):

cd ../borsh-derive-internal; cargo publish
cd ../borsh-derive; cargo publish
cd ../borsh; cargo publish

Make sure you are on the master branch, then tag the code and push the tag:

git tag -a v9.9.9 -m "My superawesome change."
git push origin v9.9.9
Commit count: 150

cargo fmt