# msgpack-schema [![Crates.io](https://img.shields.io/crates/v/msgpack-schema)](https://crates.io/crates/msgpack-schema) [![docs.rs](https://img.shields.io/docsrs/msgpack-schema)](https://docs.rs/msgpack-schema/) [![CI](https://github.com/Idein/msgpack-schema/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/Idein/msgpack-schema/actions/workflows/ci.yml) _msgpack-schema_ is a schema language for describing data formats encoded in MessagePack. It provides two derive macros `Serialize` and `Deserialize` that allow you to transcode MessagePack binary data to/from Rust data structures in a type-directed way. ```rust use msgpack_schema::{Deserialize, Serialize}; #[derive(Deserialize, Serialize)] struct Human { #[tag = 0] name: String, #[tag = 2] #[optional] age: Option, } ``` Compared with other schema languages like `rmp-serde`, `msgpack-schema` allows to specify more compact data representation, e.g., fixints as field keys, fixints as variant keys, etc. ## Feature flags - `proptest`: Enable `proptest::arbitrary::Arbitrary` impls for `msgpack_value::Value`. ## Behaviours of serializers and deserializers ### Structs with named fields Structs with named fields are serialized into a `Map` object where keys are fixints specified by `#[tag]` attributes. The current implementation serializes fields in order but one must not rely on this behavior. The deserializer interprets `Map` objects to create such structs. Field order is irrelevant to the result. If `Map` objects contains extra key-value pairs which are not contained in the definition of the struct, the deserializer simply ignores them. If there are two or more values with the same key within a `Map` object, the preceding value is overwritten by the last value. ```rust #[derive(Serialize, Deserialize)] struct S { #[tag = 0] x: u32, #[tag = 1] y: String, } let s = S { x: 42, y: "hello".to_owned(), }; let b = b"\x82\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 10 bytes; `{ 0: 42, 1: "hello" }` assert_eq!(serialize(&s), b); assert_eq!(s, deserialize(b).unwrap()); // ignores irrelevant key-value pairs let b = b"\x83\x00\x2A\x02\xC3\x01\xA5\x68\x65\x6c\x6c\x6f"; // 12 bytes; `{ 0: 42, 2: true, 1: "hello" }` assert_eq!(s, deserialize(b).unwrap()); // last value wins let b = b"\x83\x00\xC3\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 12 bytes; `{ 0: true, 0: 42, 1: "hello" }` assert_eq!(s, deserialize(b).unwrap()); ``` Fields in named structs may be tagged with `#[optional]`. - The tagged field must be of type `Option`. - On serialization, the key-value pair will not be included in the result map object when the field data contains `None`. - On deserialization, the field of the result struct will be filled with `None` when the given MsgPack map object contains no corresponding key-value pair. ```rust #[derive(Serialize, Deserialize)] struct S { #[tag = 0] x: u32, #[optional] #[tag = 1] y: Option, } let s = S { x: 42, y: Some("hello".to_owned()), }; let b = b"\x82\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 10 bytes; `{ 0: 42, 1: "hello" }` assert_eq!(serialize(&s), b); assert_eq!(s, deserialize(b).unwrap()); let s = S { x: 42, y: None, }; let b = b"\x81\x00\x2A"; // 3 bytes; `{ 0: 42 }` assert_eq!(serialize(&s), b); assert_eq!(s, deserialize(b).unwrap()); ``` The `#[flatten]` attribute is used to factor out a single definition of named struct into multiple ones. ```rust #[derive(Serialize)] struct S1 { #[tag = 1] x: u32, } #[derive(Serialize)] struct S2 { #[flatten] s1: S1, #[tag = 2] y: u32, } #[derive(Serialize)] struct S3 { #[tag = 1] x: u32, #[tag = 2] y: u32, } assert_eq!(serialize(S2 { s1: S1 { x: 42 }, y: 43, }), serialize(S3 { x: 42, y: 43 })); ``` Structs with named fields may be attached `#[untagged]`. Untagged structs are serialized into an array and will not contain tags. ```rust #[derive(Serialize, Deserialize)] #[untagged] struct S { x: u32, y: String, } let s = S { x: 42, y: "hello".to_owned(), }; let b = b"\x92\x2A\xA5\x68\x65\x6c\x6c\x6f"; // 8 bytes; `[ 42, "hello" ]` assert_eq!(serialize(&s), b); assert_eq!(s, deserialize(b).unwrap()); ``` ### Newtype structs Tuple structs with only one element are treated transparently. ```rust #[derive(Serialize, Deserialize)] struct S(u32); let s = S(42); let b = b"\x2A"; // 1 byte; `42` assert_eq!(serialize(&s), b); assert_eq!(s, deserialize(b).unwrap()); ``` ### Unit structs and empty tuple structs Serialization and deserialization of unit structs and empty tuple structs are intentionally unsupported. ```rust // It is error to derive `Serialize` / `Deserialize` for these types of structs. struct S1; struct S2(); ``` ### Tuple structs Tuple structs with more than one element are encoded as an array. It is validation error to deserialize an array with unmatched length. ```rust #[derive(Serialize, Deserialize)] struct S(u32, bool); let s = S(42, true); let b = b"\x92\x2A\xC3"; // 3 bytes; `[ 42, true ]` assert_eq!(serialize(&s), b); assert_eq!(s, deserialize(b).unwrap()); ``` ### Unit variants and empty tuple variants Unit variants and empty tuple variants are serialized into a single fixint whose value is determined by the tag. ```rust #[derive(Serialize, Deserialize)] enum E { #[tag = 3] Foo } let e = E::Foo; let b = b"\x03"; // 1 byte; `3` assert_eq!(serialize(&e), b); assert_eq!(e, deserialize(b).unwrap()); ``` ```rust #[derive(Serialize, Deserialize)] enum E { #[tag = 3] Foo() } let e = E::Foo(); let b = b"\x03"; // 1 byte; `3` assert_eq!(serialize(&e), b); assert_eq!(e, deserialize(b).unwrap()); ``` ### Newtype variants Newtype variants (one-element tuple variants) are serialized into an array of the tag and the inner value. ```rust #[derive(Serialize, Deserialize)] enum E { #[tag = 3] Foo(u32) } let e = E::Foo(42); let b = b"\x92\x03\x2A"; // 3 bytes; `[ 3, 42 ]` assert_eq!(serialize(&e), b); assert_eq!(e, deserialize(b).unwrap()); ``` ### Untagged variants Enums may be attached `#[untagged]` when all variants are newtype variants. Serializing untagged variants results in the same data layout as the inner type. The deserializer deserializes into an untagged enum type by trying deserization one by one from the first variant to the last. ```rust #[derive(Serialize, Deserialize)] #[untagged] enum E { Foo(String), Bar(u32), } let e = E::Bar(42); let b = b"\x2A"; // 1 byte; `42` assert_eq!(serialize(&e), b); assert_eq!(e, deserialize(b).unwrap()); ``` ## Write your own implementation of `Serialize` and `Deserialize` You may want to write your own implementation of `Serialize` and `Deserialize` in the following cases: 1. You need `impl` for types that are already defined by someone. 2. You need extreme efficiency. 3. Both. [IpAddr](std::net::IpAddr) is such a type satisfying (3). In the most efficient situation, we want it to be 4 or 16 byte length plus one byte for a tag at any time. This is achieved by giving a hard-written implementation like below. ```rust struct IpAddr(pub std::net::IpAddr); impl Serialize for IpAddr { fn serialize(&self, serializer: &mut Serializer) { match self.0 { std::net::IpAddr::V4(v4) => { serializer.serialize_str(&v4.octets()); // 5 bytes } std::net::IpAddr::V6(v6) => { serializer.serialize_str(&v6.octets()); // 17 bytes } } } } impl Deserialize for IpAddr { fn deserialize(deserializer: &mut Deserializer) -> Result { let Str(data) = deserializer.deserialize()?; let ipaddr = match data.len() { 4 => std::net::IpAddr::V4(std::net::Ipv4Addr::from( <[u8; 4]>::try_from(data).unwrap(), )), 16 => std::net::IpAddr::V6(std::net::Ipv6Addr::from( <[u8; 16]>::try_from(data).unwrap(), )), _ => return Err(ValidationError.into()), }; Ok(Self(ipaddr)) } } ``` ## Appendix: Cheatsheet
schema Rust MessagePack (human readable)
struct S {
    #[tag = 0]
    x: u32,
    #[tag = 1]
    y: bool,
}
S { x: 42, y: true } { 0: 42, 1: true }
struct S {
    #[optional]
    #[tag = 0]
    x: Option<u32>,
}
S { x: Some(42) } { 0: 42 }
struct S {
    #[optional]
    #[tag = 0]
    x: Option<u32>,
}
S { x: None } {}
#[untagged]
struct S {
    #[tag = 0]
    x: u32,
    #[tag = 1]
    y: bool,
}
S { x: 42, y: true } [ 42, true ]
struct S(u32) S(42) 42
struct S S UNSUPPORTED
struct S() S() UNSUPPORTED
struct S(u32, bool) S(42, true) [ 42, true ]
enum E {
    #[tag = 3]
    Foo
}
E::Foo 3
enum E {
    #[tag = 3]
    Foo()
}
E::Foo() 3
enum E {
    #[tag = 3]
    Foo(u32)
}
E::Foo(42) [ 3, 42 ]
#[untagged]
enum E {
    Foo(u32)
    Bar(bool)
}
E::Bar(true) true
#### License Licensed under MIT license.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in msgpack-schema by you shall be licensed as above, without any additional terms or conditions.