[![Current crates.io release](https://img.shields.io/crates/v/serde_bibtex)](https://crates.io/crates/serde_bibtex) [![Documentation](https://img.shields.io/badge/docs.rs-serde__bibtex-66c2a5?labelColor=555555&logoColor=white&logo=)](https://docs.rs/serde_bibtex/) # WARNING This crate is under active development and the public API may change substantially on every minor version change. The (de)serialization API is relatively stable, but some of the publicly-exposed internal state may change, particularly concerning the handling of errors. Until this is stabilized, use at your own risk! # Serde bibtex A [Rust](https://www.rust-lang.org/) library providing a [serde](https://serde.rs/) interface for `.bib` file (de)serialization. The implementation is minimally opinionated and feature-rich for convenient downstream consumption by other libraries or binaries. For examples and a thorough documentation of features, visit the [docs](https://docs.rs/serde_bibtex/latest/serde_bibtex). ## Deserializer Here are the main features. See the [deserializer docs](https://docs.rs/serde_bibtex/latest/serde_bibtex/de/index.html) for more detail. ### Flexible - Structured: read into Rust types with automatic `@string` macro expansion and other convenience features. - Unstructured: do not expand macros or collect fields values to preserve the structure of the original bibtex. - Deserialize from bytes to defer UTF-8 conversion, or even pass-through raw bytes. - Error-tolerant `Iterator` API that allows skipping malformed entries. ### Explicit and unambiguous syntax - Aims for compatibility with and tested against an independently implemented [pest grammar](/src/syntax/bibtex.pest). - Aim for compatibility with [biber](https://github.com/plk/biber) but without some of biber's [undocumented idiosyncracies](https://docs.rs/serde_bibtex/latest/serde_bibtex/syntax/index.html#differences-from-biber) or [unfixable parsing bugs](https://github.com/plk/biber/issues/456). ### Fast - Low overhead manual parser implementation (see [benchmarks](#benchmarks)). - Zero-copy deserialization. - Selective capturing of contents (see [benchmarks](#benchmarks) for speed differences) ## Serializer Here are the main features. See the [serializer docs](https://docs.rs/serde_bibtex/latest/serde_bibtex/ser/index.html) for more detail. ### Flexible - Flexibly serialize many types which are vaguely structured like BibTeX entries. - Sufficiently general to generate any valid BibTeX bibliography (up to syntactic equivalence), including all entry types such as `@string` macros, and out-putting unexpanded macros. - Implementable `Formatter` trait which allows total customization of generated BibTeX. ### Opinionated - Default `Formatter` implementations serialize in a standardized format to guarantee unambiguous parsing even by other tools. - Compact formatter when serializing for consumption by non-humans. ### Robust - Validate during serialization to guarantee generation of valid BibTeX. ## Comparison with other crates ### [typst/biblatex](https://github.com/typst/biblatex) We do not attempt to interpret the contents of the entries in the `.bib` file and instead defer interpretation for downstream consumption. On the other hand, [biblatex](https://github.com/typst/biblatex) is intended to support [typst](https://github.com/typst/typst), which requires interpreting the contents of the fields (for example, parsing of `$math$` in field values). In this sense, we might consider our implementation closer to the `biblatex::RawBibliography` entrypoint, but with the substantial extra flexibility of reading into any type implementing an appropriate `Deserialize`. ### [charlesvdv/nom-bibtex](https://github.com/charlesvdv/nom-bibtex) The functionality in this crate essentially supercedes [nom-bibtex](https://github.com/charlesvdv/nom-bibtex). The only feature of `nom-bibtex` that we do not support is the capturing of comments not explicitly contained in a `@comment` entry. ### [typho/bibparser](https://github.com/typho/bibparser) The functionality in this crate essentially supercedes [bibparser](https://github.com/typho/bibparser). ## Benchmarks The benchmark code can be find in [`benches/compare.rs`](/benches/compare.rs). The bibliography file used is [`assets/tugboat.bib`](/assets/tugboat.bib), which is part of the testing data used by biber. It is a 2.64 MB 73,993-line `.bib` file. 1. `ignore`: Deserialize using `serde::de::IgnoredAny` to parse the file but ignore the contents. 2. `struct`: Deserialize using a struct with entries capturing every field present in `assets/tugboat.bib` (15 fields total), expanding macros and collapsing field values. 3. `borrow`: Deserialize into a fully borrowed Rust type which captures all data in the file but does not expand macros or collapse field values. 4. `biblatex`: Parse using `biblatex::RawBibliography::parse` (most similar to `borrow`). 5. `copy`: Deserialize into an owned Rust type with macro expansion, field value collapsing, and case-insensitive comparison where appropriate. 6. `nom-bibtex`: Parse using `nom-bibtex::Bibtex::parse` (most similar to `copy`). The benchmarks were performed on an Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz (2019 MacBook Pro). The speedup factor is relative to `biblatex`. | benchmark | factor | runtime | throughput | |------------|--------|-----------------------------------|------------| | ignore | 4.8x | `[3.3923 ms 3.3987 ms 3.4058 ms]` | 660 MB/s | | struct | 1.9x | `[8.5496 ms 8.7481 ms 8.9924 ms]` | 300 MB/s | | borrow | 1.3x | `[12.932 ms 12.962 ms 12.992 ms]` | 200 MB/s | | biblatex | 1.0x | `[16.184 ms 16.224 ms 16.266 ms]` | 160 MB/s | | copy | 0.75x | `[21.455 ms 21.690 ms 21.935 ms]` | 120 MB/s | | nom-bibtex | 0.23x | `[71.607 ms 71.912 ms 72.343 ms]` | 40 MB/s | The [bibparser](https://github.com/typho/bibparser) crate is not included in this benchmark as it is unable to parse the input file. ## Safety This crate uses some `unsafe` for string conversions when we can guarantee for other reasons that a string slice is at a valid codepoint.