serde_bibtex

Crates.ioserde_bibtex
lib.rsserde_bibtex
version0.3.1
sourcesrc
created_at2024-01-17 14:07:41.931901
updated_at2024-06-03 22:32:37.517747
descriptionA BibTex (de)serialization file format
homepage
repositoryhttps://github.com/autobib/serde_bibtex
max_upload_size
id1102986
size227,464
Alex Rutar (alexrutar)

documentation

README

Current crates.io release Documentation

WARNING

This crate is under active development and the public API may change substantially on every minor version change. The deserialization API is relatively stable, but serialization is not yet implemented and some of the publicly-exposed internal state may change. Until this is stabilized, use at your own risk!

Serde bibtex

A Rust library providing a serde interface for .bib file (de)serialization. The implementation is minimally opinionated and feature-rich for convenient downstream consumption by other libraries or binaries.

For examples and a thorough documentation of features, visit the docs.

Deserializer

Here are the main features.

Flexible

  • Structured: read into Rust types with automatic @string macro expansion and other convenience features.
  • Unstructured: do not expand macros or collect fields values to preserve the structure of the original bibtex.
  • Deserialize from bytes to defer UTF-8 conversion, or even pass-through raw bytes.
  • Error-tolerant Iterator API that allows skipping malformed entries.

Explicit and unambiguous syntax

Fast

  • Low overhead manual parser implementation (see benchmarks).

  • Zero-copy deserialization.

  • Selective capturing of contents (see benchmarks for speed differences)

Serializer

TODO: not yet implemented

Comparison with other crates

typst/biblatex

We do not attempt to interpret the contents of the entries in the .bib file and instead defer interpretation for downstream consumption. On the other hand, biblatex is intended to support typst, which requires interpreting the contents of the fields (for example, parsing of $math$ in field values). In this sense, we might consider our implementation closer to the biblatex::RawBibliography entrypoint, but with the substantial extra flexibility of reading into any type implementing an appropriate Deserialize.

charlesvdv/nom-bibtex

The functionality in this crate essentially supercedes nom-bibtex. The only feature of nom-bibtex that we do not support is the capturing of comments not explicitly contained in a @comment entry.

typho/bibparser

The functionality in this crate essentially supercedes bibparser.

Benchmarks

The benchmark code can be find in benches/compare.rs. The bibliography file used is assets/tugboat.bib, which is part of the testing data used by biber. It is a 2.64 MB 73,993-line .bib file.

  1. ignore: Deserialize using serde::de::IgnoredAny to parse the file but ignore the contents.
  2. struct: Deserialize using a struct with entries capturing every field present in assets/tugboat.bib (15 fields total), expanding macros and collapsing field values.
  3. borrow: Deserialize into a fully borrowed Rust type which captures all data in the file but does not expand macros or collapse field values.
  4. biblatex: Parse using biblatex::RawBibliography::parse (most similar to borrow).
  5. copy: Deserialize into an owned Rust type with macro expansion, field value collapsing, and case-insensitive comparison where appropriate.
  6. nom-bibtex: Parse using nom-bibtex::Bibtex::parse (most similar to copy).

The benchmarks were performed on an Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz (2019 MacBook Pro).

benchmark factor runtime throughput
ignore 0.18x [3.3923 ms 3.3987 ms 3.4058 ms] 660 MB/s
struct 0.67x [8.5496 ms 8.7481 ms 8.9924 ms] 300 MB/s
borrow 1.0x [12.932 ms 12.962 ms 12.992 ms] 200 MB/s
biblatex 1.3x [16.184 ms 16.224 ms 16.266 ms] 160 MB/s
copy 1.7x [21.455 ms 21.690 ms 21.935 ms] 120 MB/s
nom-bibtex 5.5x [71.607 ms 71.912 ms 72.343 ms] 40 MB/s

The bibparser crate is not included in this benchmark as it is unable to parse the input file.

Safety

This crate uses some unsafe for string conversions when we can guarantee for other reasons that a string slice is at a valid codepoint.

Commit count: 61

cargo fmt