biome_deserialize

Crates.iobiome_deserialize
lib.rsbiome_deserialize
version0.6.0
sourcesrc
created_at2023-09-27 23:59:13.968856
updated_at2024-05-25 13:34:10.302703
descriptionUtilities to deserialize values
homepagehttps://biomejs.dev/
repositoryhttps://github.com/biomejs/biome
max_upload_size
id985359
size104,877
Victorien Elvinger (Conaclos)

documentation

README

biome_deserialize

biome_deserialize consists of data structures that know how to deserialize themselves along with data formats that know how to deserialize data. It provides a framework by which these two groups interact with each other, allowing any supported data structure to be deserialized using any supported data format.

This crate inspired by serde. 0ne of the main difference is the fault-tolerant behavior of biome_deserialize. Serde uses a fast-fail strategy, while biome_deserialize deserialize as much as possible and report several diagnostics (errors, warning, deprecation messages, ...). Also, biome_deserialize is intended to deserialize textual data formats.

biome_deserialize assumes that every supported data formats supports the following types:

  • null-like values;
  • boolean;
  • number -- integers and floats;
  • string;
  • array;
  • maps of key-value pairs (covers objects).

It currently supports the JSON data format.

Design overview

The two most important traits are Deserializable, DeserializableValue.

  • A type that implements Deserializable is a data structure that can be deserialized from any supported data format;
  • A type that implements DeserializableValue is a data format that can deserialize any supported data structure.

Simple implementations of Deserializable can reuse other deserializable data structures. For instance, an enumeration that corresponds to a string among A, B, and C, can first deserialize a string and then check that the string is one of its values.

Data structures that cannot directly use another deserializable data structures, use a visitor. A visitor is generally a zero-sized data structure that implements the DeserializationVisitor trait. A visitor is a well-known design pattern. It allows selecting an implementation based on the deserialized type without bothering of data format details.

Usage examples

Deserializing common types

biome_deserialize implements Deserializable for common Rust data structure.

In the following example, we deserialize a boolean, an array of integers, and an unordered map of string-integer pairs.

use biome_deserialize::json::deserialize_from_json_str;
use biome_deserialize::Deserialized;
use biome_json_parser::JsonParserOptions;

let json = "false";
let Deserialized {
    deserialized,
    diagnostics,
} = deserialize_from_json_str::<bool>(&source, JsonParserOptions::default());
assert_eq!(deserialized, Some(false));
assert!(diagnostics.is_empty());

let json = "[0, 1]";
let Deserialized {
    deserialized,
    diagnostics,
} = deserialize_from_json_str::<Vec<u8>>(&source, JsonParserOptions::default());
assert_eq!(deserialized, Some(vec![0, 1]));
assert!(diagnostics.is_empty());

use std::collections::HashMap;
let json = r#"{ "a": 0, "b": 1 }"#;
let Deserialized {
    deserialized,
    diagnostics,
} = deserialize_from_json_str::<HashMap<String, u8>>(&source, JsonParserOptions::default());
assert_eq!(deserialized, Some(HashMap::from([("a".to_string(), 0), ("b".to_string(), 1)])));
assert!(diagnostics.is_empty());

Deserializable derive macro

For more complex types, such as structs and enums, biome_deserialize_macros offers a derive macro which will generate an implementation of the Deserializable trait for you.

For example:

#[derive(Clone, Debug, Default, Deserializable)]
pub struct MyRuleOptions {
    behavior: Behavior,
    threshold: u8,
    behavior_exceptions: Vec<String>
}

Extensive documentation for the macro is also available if you hover over it in your IDE.

Note that the macro has two main limitations:

  • The current implementation only supports enums where none of the variants have custom fields, structs with named fields, and newtypes.
  • Every field must also implement Deserializable.

If you want to implement Deserializable for a new primitive type, or a complex type that falls outside the limitations above, you'll need to implement it manually. See below for the section about Implementing a custom deserializer.

Deserializing map and array collections

biome_deserialize requires that every data format supports arrays and maps. A map and an array can be deserialized into several types in Rust.

biome_deserialize is able to deserialize an array into a Vec, a HashSet, or a IndexSet.

  • Vec preserves the insertion order and allows the repetition of values;
  • HashSet doesn't preserve the insertion order and disallows the repetition of values;
  • IndexSet preserves the insertion order and disallows the repetition of values.

biome_deserialize is able to deserialize a map into a HashMap, a BTreeMap, or a IndexMap.

  • HashMap and BTreeMap don't preserve the insertion order;
  • IndexMap preserves the insertion order.

If you hesitate between a collection that preserves the insertion order and one that doesn't, chooses the collection that preserves the insertion order. This often outputs less surprising behavior.

Implementing a custom deserializer

For most enums and structs, an implementation can be generated by the Deserializable macro. However, for primitives, and certain complex types, you may need to implement biome_deserialize::Deserializable yourself.

[!NOTE] If you are curious about the code generated by the macro, Rust Analyzer offers a great feature from the command palette called "Expand macro recursively at caret". Just put your cursor on the word Deserializable in the #[derive(...)] statement and invoke the command. A panel should open with the expanded macro code.

We provide a few examples for custom implementations below.

Custom integer range

Sometimes you want to deserialize an integer and ensure that it is between two given integers.

For instance, let's assume we want to deserialize a day represented by an integer between 1 and 365. We can use the new-type idiom in Rust:

use std::str::FromStr;
use biome_deserialize::{Deserializable, DeserializableValue, DeserializationDiagnostic, TextNumber};

#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash, Ord, PartialOrd)]
pub struct Day(u16);

impl Day {
    pub const MIN: Day = Day(1);
    pub const MAX: Day = Day(365);
    pub fn get(&self) -> u16 {
        self.0
    }
}

impl Default for Day {
    fn default() -> Self {
        Self::MIN
    }
}

impl TryFrom<u16> for Day {
    type Error = &'static str;
    fn try_from(value: u16) -> Result<Self, Self::Error> {
        if (Self::MIN.get()..=Self::MAX.get()).contains(&value) {
            Ok(Self(value))
        } else {
            Err("A day must be between 1 and 365")
        }
    }
}

impl FromStr for Day {
    type Err = &'static str;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        s.parse::<u16>()
            .map_err(|_error| "A day must be an integer between 1 and 365")
            .and_then(|value| Day::try_from(value))
    }
}

impl Deserializable for Day {
    fn deserialize(
        value: &impl DeserializableValue,
        name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self> {
        // We deserialize the value into a number represented as a string.
        let value_text = TextNumber::deserialize(value, name, diagnostics)?;
        // We attempt to convert the string into a `Day`.
        value_text.parse::<Day>().map_err(|error| {
            // If the conversion failed, then we report the error.
            diagnostics.push(DeserializationDiagnostic::new(error).with_range(value.range()));
        }).ok()
    }
}

use biome_deserialize::json::deserialize_from_json_str;
use biome_deserialize::Deserialized;
use biome_json_parser::JsonParserOptions;

let json = "42";
let Deserialized {
    deserialized,
    diagnostics,
} = deserialize_from_json_str::<Day>(&source, JsonParserOptions::default());
assert_eq!(deserialized, Some(Day(42)));
assert!(diagnostics.is_empty());

let json = "999";
let Deserialized {
    deserialized,
    diagnostics,
} = deserialize_from_json_str::<Day>(&source, JsonParserOptions::default());
assert_eq!(deserialized, None);
assert_eq!(diagnostics..len(), 1);

Deserializing a union

Sometimes we want to allow several types for a single value. For instance let's assume that we cant to accept a JSON value that is either a string or a boolean.

In this case, we'll need to inspect the type of the DeserializableValue to know which deserializer to use:

use biome_deserialize::{DeserializationDiagnostic, Deserializable, DeserializableValue, DeserializationVisitor, Text, VisitableType};
use biome_rowan::TextRange;

#[derive(Debug, Eq, PartialEq)]
enum Union {
    Bool(bool),
    Str(String),
}

impl Deserializable for Union {
    fn deserialize(
        value: &impl DeserializableValue,
        name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self> {
        if value.is_type(VisitableType::BOOL) {
            biome_deserialize::Deserializable::deserialize(value, rule_name, diagnostics)
                .map(Self::Bool)
        } else {
            biome_deserialize::Deserializable::deserialize(value, rule_name, diagnostics)
                .map(Self::Str)
        }
    }
}

use biome_deserialize::json::deserialize_from_json_str;
use biome_json_parser::JsonParserOptions;

let source = r#" "string" "#;
let deserialized = deserialize_from_json_str::<Union>(&source, JsonParserOptions::default());
assert!(!deserialized.has_errors());
assert_eq!(deserialized.into_deserialized(), Some(Union::Str("string".to_string())));

let source = "true";
let deserialized = deserialize_from_json_str::<Union>(&source, JsonParserOptions::default());
assert!(!deserialized.has_errors());
assert_eq!(deserialized.into_deserialized(), Some(Union::Bool(true)));

Alternatively, we could implement the above using a custom visitor. Note that this approach is much more involved and should only be used as a last resort. Here, we will reimplement Deserializable for our Union type with a visitor for demonstration purposes.

To create your own visitor, start with a struct named after your type with a Visitor suffix. We'll call ours UnionVisitor.

The visitor struct doesn't need any fields, but we do need to implement DeserializationVisitor on it. A DeserializationVisitor provides several visit_ methods and you must implement the visit_ methods of the type(s) you expect. Here we expect either a boolean or a string, so we'll implement visit_bool() and visit_str().

We also have to set the associated type Output to be a union of the types we expect: VisitableType::BOOL.union(VisitableType::STR).

The full example:

impl Deserializable for Union {
    fn deserialize(
        value: &impl DeserializableValue,
        name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self> {
        // Delegate deserialization to `UnionVisitor`
        value.deserialize(UnionVisitor, name, diagnostics)
    }
}

struct UnionVisitor;
impl DeserializationVisitor for UnionVisitor {
    type Output = Union;

    // We expect a `bool` or a `str` as data type.
    const EXPECTED_TYPE: VisitableType = VisitableType::BOOL.union(VisitableType::STR);

    // Because we expect a `bool` or a `str`, we have to implement the associated method `visit_bool`.
    fn visit_bool(
        self,
        value: bool,
        range: TextRange,
        _name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self::Output> {
        Some(Union::Bool(value))
    }

    // Because we expect a `bool` or a `str`, we have to implement the associated method `visit_str`.
    fn visit_str(
        self,
        value: Text,
        range: TextRange,
        _name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self::Output> {
        Some(Union::Str(value.text().to_string()))
    }
}

Implementing Deserializable for an enumeration of values

Let's assume that we want to deserialize a JSON string that is either A, or B. We represent this string by a Rust's enum:

To implement Deserializable for Variant, we can reuse the String, because biome_deserialize implements Deserializable for the String type.

Our implementation attempts to deserialize a string and creates the corresponding variant. If the variant is not known, we report a diagnostic.

use biome_deserialize::{Deserializable, DeserializableValue, DeserializationDiagnostic};

#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash, Ord, PartialOrd)]
pub enum Variant { A, B }

impl Deserializable for Variant {
    fn deserialize(
        value: &impl DeserializableValue,
        name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self> {
        match String::deserialize(value, name, diagnostics)? {
            "A" => Some(Variant::A),
            "B" => Some(Variant::B),
            unknown_variant => {
                const ALLOWED_VARIANTS: &[&str] = &["A", "B"];
                diagnostics.push(DeserializationDiagnostic::new_unknown_value(
                    unknown_variant,
                    value.range(),
                    ALLOWED_VARIANTS,
                ));
                None
            }
        }
    }
}

use biome_deserialize::json::deserialize_from_json_str;
use biome_deserialize::Deserialized;
use biome_json_parser::JsonParserOptions;

let json = "\"A\"";
let Deserialized {
    deserialized,
    diagnostics,
} = deserialize_from_json_str::<Day>(&source, JsonParserOptions::default());
assert_eq!(deserialized, Some(Variant::A));
assert!(diagnostics.is_empty());

We can improve our implementation by avoiding the heap-allocation of a string. To do this, we use Text instead of String. Internally Text borrows a slice of the source.

use biome_deserialize::{Deserializable, DeserializableValue, DeserializationDiagnostic, Text};

#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash, Ord, PartialOrd)]
pub enum Variant { A, B }

impl Deserializable for Variant {
    fn deserialize(
        value: &impl DeserializableValue,
        name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self> {
        match Text::deserialize(value, name, diagnostics)?.text() {
            "A" => Some(Variant::A),
            "B" => Some(Variant::B),
            unknown_variant => {
                const ALLOWED_VARIANTS: &[&str] = &["A", "B"];
                diagnostics.push(DeserializationDiagnostic::new_unknown_value(
                    unknown_variant,
                    value.range(),
                    ALLOWED_VARIANTS,
                ));
                None
            }
        }
    }
}

Implementing Deserializable for a struct

Let's assume we want to deserialize a JSON map (object) into an instance of a Rust struct.

Because a struct has custom fields and types, we cannot reuse an existing deserializable type. We have to delegate the deserialization to a visitor.

To do that, we create a zero-sized struct PersonViistor that implements DeserializationVisitor. A DeserializationVisitor provides several visit_ methods. You must implement the visit_ methods of the type you expect. Here we expect a map of key-value pairs. Thus, we implement visit_map and set the associated constant EXPECTED_TYPE to VisitableType::MAP. We also set the associated type Output to the type that we want to produce: Person.

The implementation of Deserialziable for Person simply delegates the deserialization of the visitor. Internally, the deserialization of value calls the visit_ method that corresponds to its type. If the value is a map of key-value pairs, then visit_map is called. Otherwise, another visit_ method is called. The default implementation of a visit_ method reports an incorrect type diagnostic.

Our implementation of visit_map traverses the map of key-value pairs. It attempts to deserialize every key as a string and deserialize the value according the key's content. If the key is name, then we deserialize a String. If the key is age, then we deserialize a u8. String and u8 implements Deserializable. Thus, we can reuse String::deserialize and u8::deserialize.

Note that if you use Serde in tandem with biome_deserialize, you have to disambiguate the call to deserialize. Thus, instead of using String::deserialize and u8::deserialize, you should use Deserialize::deserialize.

use biome_deserialize::{DeserializationDiagnostic, Deserializable, DeserializableValue, DeserializationVisitor, Text, VisitableType};
use biome_rowan::TextRange;

#[derive(Debug, Default, Eq, PartialEq, Clone)]
pub struct Person { name: String, age: u8 }

impl Deserializable for Person {
    fn deserialize(
        value: &impl DeserializableValue,
        name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self> {
        // Delegate the deserialization to `PersonVisitor`.
        // `value` will call the `PersonVisitor::viist_` method that corresponds to its type.
        value.deserialize(PersonVisitor, name, diagnostics)
    }
}

struct PersonVisitor;
impl DeserializationVisitor for PersonVisitor {
    // The visitor deserialize a [Person].
    type Output = Person;

    // We expect a `map` as data type.
    const EXPECTED_TYPE: VisitableType = VisitableType::MAP;

    // Because we expect a `map`, we have to implement the associated method `visit_map`.
    fn visit_map(
        self,
        // Iterator of key-value pairs.
        members: impl Iterator<Item = Option<(impl DeserializableValue, impl DeserializableValue)>>,
        // range of the map in the source text.
        range: TextRange,
        _name: &str,
        diagnostics: &mut Vec<DeserializationDiagnostic>,
    ) -> Option<Self::Output> {
        let mut result = Person::default();
        for (key, value) in members.flatten() {
            // Try to deserialize the key as a string.
            // We use `Text` to avoid an heap-allocation.
            let Some(key_text) = Text::deserialize(&key, "", diagnostics) else {
                // If this failed, then pass to the next key-value pair.
                continue;
            };
            match key_text.text() {
                "name" => {
                    if let Some(name) = String::deserialize(&value, &key_text, diagnostics) {
                        result.name = name;
                    }
                },
                "age" => {
                    if let Some(age) = u8::deserialize(&value, &key_text, diagnostics) {
                        result.age = age;
                    }
                },
                unknown_key => {
                    const ALLOWED_KEYS: &[&str] = &["name"];
                    diagnostics.push(DeserializationDiagnostic::new_unknown_key(
                        unknown_key,
                        key.range(),
                        ALLOWED_KEYS,
                    ));
                }
            }
        }
        Some(result)
    }
}

use biome_deserialize::json::deserialize_from_json_str;
use biome_json_parser::JsonParserOptions;

let source = r#"{ "name": "Isaac Asimov" }"#;
let deserialized = deserialize_from_json_str::<Person>(&source, JsonParserOptions::default());
assert!(!deserialized.has_errors());
assert_eq!(deserialized.into_deserialized(), Some(Person { name: "Isaac Asimov".to_string() }));
Commit count: 7051

cargo fmt