serde_bibtex

Crates.io	serde_bibtex
lib.rs	serde_bibtex
version	0.6.0
source	src
created_at	2024-01-17 14:07:41.931901
updated_at	2024-10-22 13:42:30.892455
description	A BibTex (de)serialization file format
homepage
repository	https://github.com/autobib/serde_bibtex
max_upload_size
id	1102986
size	316,032

Alex Rutar (alexrutar)

documentation

README

WARNING

This crate is under active development and the public API may change substantially on every minor version change. The (de)serialization API is relatively stable, but some of the publicly-exposed internal state may change, particularly concerning the handling of errors. Until this is stabilized, use at your own risk!

Serde bibtex

A Rust library providing a serde interface for .bib file (de)serialization. The implementation is minimally opinionated and feature-rich for convenient downstream consumption by other libraries or binaries.

For examples and a thorough documentation of features, visit the docs.

Deserializer

Here are the main features. See the deserializer docs for more detail.

Flexible

Structured: read into Rust types with automatic @string macro expansion and other convenience features.
Unstructured: do not expand macros or collect fields values to preserve the structure of the original bibtex.
Deserialize from bytes to defer UTF-8 conversion, or even pass-through raw bytes.
Error-tolerant Iterator API that allows skipping malformed entries.

Explicit and unambiguous syntax

Aims for compatibility with and tested against an independently implemented pest grammar.
Aim for compatibility with biber but without some of biber's undocumented idiosyncracies or unfixable parsing bugs.

Fast

Low overhead manual parser implementation (see benchmarks).
Zero-copy deserialization.
Selective capturing of contents (see benchmarks for speed differences)

Serializer

Here are the main features. See the serializer docs for more detail.

Flexible

Flexibly serialize many types which are vaguely structured like BibTeX entries.
Sufficiently general to generate any valid BibTeX bibliography (up to syntactic equivalence), including all entry types such as @string macros, and out-putting unexpanded macros.
Implementable Formatter trait which allows total customization of generated BibTeX.

Opinionated

Default Formatter implementations serialize in a standardized format to guarantee unambiguous parsing even by other tools.
Compact formatter when serializing for consumption by non-humans.

Robust

Validate during serialization to guarantee generation of valid BibTeX.

Comparison with other crates

typst/biblatex

We do not attempt to interpret the contents of the entries in the .bib file and instead defer interpretation for downstream consumption. On the other hand, biblatex is intended to support typst, which requires interpreting the contents of the fields (for example, parsing of $math$ in field values). In this sense, we might consider our implementation closer to the biblatex::RawBibliography entrypoint, but with the substantial extra flexibility of reading into any type implementing an appropriate Deserialize.

charlesvdv/nom-bibtex

The functionality in this crate essentially supercedes nom-bibtex. The only feature of nom-bibtex that we do not support is the capturing of comments not explicitly contained in a @comment entry.

typho/bibparser

The functionality in this crate essentially supercedes bibparser.

Benchmarks

The benchmark code can be find in benches/compare.rs. The bibliography file used is assets/tugboat.bib, which is part of the testing data used by biber. It is a 2.64 MB 73,993-line .bib file.

ignore: Deserialize using serde::de::IgnoredAny to parse the file but ignore the contents.
struct: Deserialize using a struct with entries capturing every field present in assets/tugboat.bib (15 fields total), expanding macros and collapsing field values.
borrow: Deserialize into a fully borrowed Rust type which captures all data in the file but does not expand macros or collapse field values.
biblatex: Parse using biblatex::RawBibliography::parse (most similar to borrow).
copy: Deserialize into an owned Rust type with macro expansion, field value collapsing, and case-insensitive comparison where appropriate.
nom-bibtex: Parse using nom-bibtex::Bibtex::parse (most similar to copy).

The benchmarks were performed on an Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz (2019 MacBook Pro). The speedup factor is relative to biblatex.

benchmark	factor	runtime	throughput
ignore	4.8x	`[3.3923 ms 3.3987 ms 3.4058 ms]`	660 MB/s
struct	1.9x	`[8.5496 ms 8.7481 ms 8.9924 ms]`	300 MB/s
borrow	1.3x	`[12.932 ms 12.962 ms 12.992 ms]`	200 MB/s
biblatex	1.0x	`[16.184 ms 16.224 ms 16.266 ms]`	160 MB/s
copy	0.75x	`[21.455 ms 21.690 ms 21.935 ms]`	120 MB/s
nom-bibtex	0.23x	`[71.607 ms 71.912 ms 72.343 ms]`	40 MB/s

The bibparser crate is not included in this benchmark as it is unable to parse the input file.

Safety

This crate uses some unsafe for string conversions when we can guarantee for other reasons that a string slice is at a valid codepoint.

Commit count: 87

serde_bibtex

documentation

README

WARNING

Serde bibtex

Deserializer

Flexible

Explicit and unambiguous syntax

Fast

Serializer

Flexible

Opinionated

Robust

Comparison with other crates

typst/biblatex

charlesvdv/nom-bibtex

typho/bibparser

Benchmarks

Safety

cargo fmt