Crates.io | no_proto |
lib.rs | no_proto |
version | 0.9.60 |
source | src |
created_at | 2020-03-17 03:33:34.837423 |
updated_at | 2021-03-26 20:53:29.652331 |
description | Flexible, Fast & Compact Serialization with RPC |
homepage | https://github.com/only-cliches/NoProto |
repository | https://github.com/only-cliches/NoProto |
max_upload_size | |
id | 219686 |
size | 813,563 |
Github | Crates.io | Documentation
Lightweight
no_std
support, WASM readyStable
Easy
Fast
Powerful
Native byte-wise sorting
Supports recursive data types
Supports most common native data types
Supports collections (list, map, struct & tuple)
Supports arbitrary nesting of collection types
Schemas support default values and non destructive updates
Transport agnostic RPC Framework.
Compiled formats like Flatbuffers, CapN Proto and bincode have amazing performance and extremely compact buffers, but you MUST compile the data types into your application. This means if the schema of the data changes the application must be recompiled to accomodate the new schema.
Dynamic formats like JSON, MessagePack and BSON give flexibilty to store any data with any schema at runtime but the buffers are fat and performance is somewhere between horrible and hopefully acceptable.
NoProto takes the performance advantages of compiled formats and implements them in a flexible format.
Byte Wise Sorting Ever try to store a signed integer as a sortable key in a database? NoProto can do that. Almost every data type is stored in the buffer as byte-wise sortable, meaning buffers can be compared at the byte level for sorting without deserializing.
Primary Key Management Compound sortable keys are extremely easy to generate, maintain and update with NoProto. You don't need a custom sort function in your key-value store, you just need this library.
UUID & ULID Support NoProto is one of the few formats that come with first class suport for these popular primary key data types. It can easily encode, decode and generate these data types.
Fastest Updates NoProto is the only format that supports all mutations without deserializng. It can do the common database read -> update -> write operation between 50x - 300x faster than other dynamic formats. Benchamrks
Format | Zero-Copy | Size Limit | Mutable | Schemas | Byte-wise Sorting |
---|---|---|---|---|---|
Runtime Libs | |||||
NoProto | ✓ | ~4GB | ✓ | ✓ | ✓ |
Apache Avro | ✗ | 2^63 Bytes | ✗ | ✓ | ✓ |
JSON | ✗ | Unlimited | ✓ | ✗ | ✗ |
BSON | ✗ | ~16MB | ✓ | ✗ | ✗ |
MessagePack | ✗ | Unlimited | ✓ | ✗ | ✗ |
Compiled Libs | |||||
FlatBuffers | ✓ | ~2GB | ✗ | ✓ | ✗ |
Bincode | ✓ | ? | ✓ | ✓ | ✗ |
Protocol Buffers | ✗ | ~2GB | ✗ | ✓ | ✗ |
Cap'N Proto | ✓ | 2^64 Bytes | ✗ | ✓ | ✗ |
Veriform | ✗ | ? | ✗ | ✗ | ✗ |
use no_proto::error::NP_Error;
use no_proto::NP_Factory;
// An ES6 like IDL is used to describe schema for the factory
// Each factory represents a single schema
// One factory can be used to serialize/deserialize any number of buffers
let user_factory = NP_Factory::new(r#"
struct({ fields: {
name: string(),
age: u16({ default: 0 }),
tags: list({ of: string() })
}})
"#)?;
// create a new empty buffer
let mut user_buffer = user_factory.new_buffer(None); // optional capacity
// set the "name" field
user_buffer.set(&["name"], "Billy Joel")?;
// read the "name" field
let name = user_buffer.get::<&str>(&["name"])?;
assert_eq!(name, Some("Billy Joel"));
// set a nested value, the first tag in the tag list
user_buffer.set(&["tags", "0"], "first tag")?;
// read the first tag from the tag list
let tag = user_buffer.get::<&str>(&["tags", "0"])?;
assert_eq!(tag, Some("first tag"));
// close buffer and get internal bytes
let user_bytes: Vec<u8> = user_buffer.finish().bytes();
// open the buffer again
let user_buffer = user_factory.open_buffer(user_bytes);
// read the "name" field again
let name = user_buffer.get::<&str>(&["name"])?;
assert_eq!(name, Some("Billy Joel"));
// get the age field
let age = user_buffer.get::<u16>(&["age"])?;
// returns default value from schema
assert_eq!(age, Some(0u16));
// close again
let user_bytes: Vec<u8> = user_buffer.finish().bytes();
// we can now save user_bytes to disk,
// send it over the network, or whatever else is needed with the data
# Ok::<(), NP_Error>(())
Schemas
- Learn how to build & work with schemas.Factories
- Parsing schemas into something you can work with.Buffers
- How to create, update & compact buffers/data.RPC Framework
- How to use the RPC Framework APIs.Data & Schema Format
- Learn how data is saved into the buffer and schemas.While it's difficult to properly benchmark libraries like these in a fair way, I've made an attempt in the graph below. These benchmarks are available in the bench
folder and you can easily run them yourself with cargo run --release
.
The format and data used in the benchmarks were taken from the flatbuffers
benchmarks github repo. You should always benchmark/test your own use case for each library before making any choices on what to use.
Legend: Ops / Millisecond, higher is better
Format / Lib | Encode | Decode All | Decode 1 | Update 1 | Size (bytes) | Size (Zlib) |
---|---|---|---|---|---|---|
Runtime Libs | ||||||
NoProto | ||||||
no_proto | 1393 | 1883 | 55556 | 9524 | 308 | 198 |
Apache Avro | ||||||
avro-rs | 156 | 57 | 56 | 40 | 702 | 337 |
FlexBuffers | ||||||
flexbuffers | 444 | 962 | 24390 | 294 | 490 | 309 |
JSON | ||||||
json | 609 | 481 | 607 | 439 | 439 | 184 |
serde_json | 938 | 646 | 644 | 403 | 446 | 198 |
BSON | ||||||
bson | 129 | 116 | 123 | 90 | 414 | 216 |
rawbson | 130 | 1117 | 17857 | 89 | 414 | 216 |
MessagePack | ||||||
rmp | 661 | 623 | 832 | 202 | 311 | 193 |
messagepack-rs | 152 | 266 | 284 | 138 | 296 | 187 |
Compiled Libs | ||||||
Flatbuffers | ||||||
flatbuffers | 3165 | 16393 | 250000 | 2532 | 264 | 181 |
Bincode | ||||||
bincode | 6757 | 9259 | 10000 | 4115 | 163 | 129 |
Postcard | ||||||
postcard | 3067 | 7519 | 7937 | 2469 | 128 | 119 |
Protocol Buffers | ||||||
protobuf | 953 | 1305 | 1312 | 529 | 154 | 141 |
prost | 1464 | 2020 | 2232 | 1040 | 154 | 142 |
Abomonation | ||||||
abomonation | 2342 | 125000 | 500000 | 2183 | 261 | 160 |
Rkyv | ||||||
rkyv | 1605 | 37037 | 200000 | 1531 | 180 | 154 |
Vec<u8>
.Vec<u8>
into all fields.Vec<u8>
into one field.Vec<u8>
.Runtime VS Compiled Libs: Some formats require data types to be compiled into the application, which increases performance but means data types cannot change at runtime. If data types need to mutate during runtime or can't be known before the application is compiled (like with databases), you must use a format that doesn't compile data types into the application, like JSON or NoProto.
Complete benchmark source code is available here. Suggestions for improving the quality of these benchmarks is appreciated.
If your use case fits any of the points below, NoProto might be a good choice for your application.
Flexible At Runtime
If you need to work with data types that will change or be created at runtime, you normally have to pick something like JSON since highly optimized formats like Flatbuffers and Bincode depend on compiling the data types into your application (making everything fixed at runtime). When it comes to formats that can change/implement data types at runtime, NoProto is fastest format we're aware of (if you know if one that might be faster, let us know!).
Safely Accept Untrusted Data The worse case failure mode for NoProto buffers is junk data. While other formats can cause denial of service attacks or allow unsafe memory access, there is no such failure case with NoProto. There is no way to construct a NoProto buffer that would cause any detrement in performance to the host application or lead to unsafe memory access. Also, there is no panic causing code in the library, meaning it will never crash your application.
Extremely Fast Updates
If you have a workflow in your application that is read -> modify -> write with buffers, NoProto will usually outperform every other format, including Bincode and Flatbuffers. This is because NoProto never actually deserializes, it doesn't need to. This includes complicated mutations like pushing a value onto a nested list or replacing entire structs.
All Fields Optional, Insert/Update In Any Order
Many formats require that all values be present to close the buffer, further they may require data to be inserted in a specific order to accomodate the encoding/decoding scheme. With NoProto, all fields are optional and any update/insert can happen in any order.
Incremental Deserializing
You only pay for the fields you read, no more. There is no deserializing step in NoProto, opening a buffer performs no operations. Once you start asking for fields, the library will navigate the buffer using the format rules to get just what you asked for and nothing else. If you have a workflow in your application where you read a buffer and only grab a few fields inside it, NoProto will outperform most other libraries.
Bytewise Sorting
Almost all of NoProto's data types are designed to serialize into bytewise sortable values, including signed integers. When used with Tuples, making database keys with compound sorting is extremly easy. When you combine that with first class support for UUID
s and ULID
s NoProto makes an excellent tool for parsing and creating primary keys for databases like RocksDB, LevelDB and TiKV.
no_std
Support
If you need a serialization format with low memory usage that works in no_std
environments, NoProto is one of the few good choices.
Stable
NoProto will never cause a panic in your application. It has zero panics or unwraps, meaning there is no code path that could lead to a panic. Fallback behavior is to provide a sane default path or bubble an error up to the caller.
CPU Independent
All numbers and pointers in NoProto buffers are always stored in big endian, so you can safely create buffers on any CPU architecture and know that they will work with any other CPU architecture.
If you can safely compile all your data types into your application, all the buffers/data is trusted, and you don't intend to mutate buffers after they're created, Bincode/Flatbuffers/CapNProto is a better choice for you.
If your data changes so often that schemas don't really make sense or the format you use must be self describing, JSON/BSON/MessagePack is a better choice. Although I'd argue that if you can make schemas work you should. Once you can use a format with schemas you save a ton of space in the resulting buffers and performance far better.
This library makes use of unsafe
to get better performance. Generally speaking, it's not possible to have a high performance serialization library without unsafe
. It is only used where performance improvements are significant and additional checks are performed so that the worst case for any unsafe
block is it leads to junk data in a buffer.
MIT License
Copyright (c) 2021 Scott Lott
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.