Crates.io | serde_bibtex |
lib.rs | serde_bibtex |
version | 0.6.0 |
source | src |
created_at | 2024-01-17 14:07:41.931901 |
updated_at | 2024-10-22 13:42:30.892455 |
description | A BibTex (de)serialization file format |
homepage | |
repository | https://github.com/autobib/serde_bibtex |
max_upload_size | |
id | 1102986 |
size | 316,032 |
This crate is under active development and the public API may change substantially on every minor version change. The (de)serialization API is relatively stable, but some of the publicly-exposed internal state may change, particularly concerning the handling of errors. Until this is stabilized, use at your own risk!
A Rust library providing a serde interface for .bib
file (de)serialization.
The implementation is minimally opinionated and feature-rich for convenient downstream consumption by other libraries or binaries.
For examples and a thorough documentation of features, visit the docs.
Here are the main features. See the deserializer docs for more detail.
@string
macro expansion and other convenience features.Iterator
API that allows skipping malformed entries.Low overhead manual parser implementation (see benchmarks).
Zero-copy deserialization.
Selective capturing of contents (see benchmarks for speed differences)
Here are the main features. See the serializer docs for more detail.
@string
macros, and out-putting unexpanded macros.Formatter
trait which allows total customization of generated BibTeX.Formatter
implementations serialize in a standardized format to guarantee unambiguous parsing even by other tools.Validate during serialization to guarantee generation of valid BibTeX.
We do not attempt to interpret the contents of the entries in the .bib
file and instead defer interpretation for downstream consumption.
On the other hand, biblatex is intended to support typst, which requires interpreting the contents of the fields (for example, parsing of $math$
in field values).
In this sense, we might consider our implementation closer to the biblatex::RawBibliography
entrypoint, but with the substantial extra flexibility of reading into any type implementing an appropriate Deserialize
.
The functionality in this crate essentially supercedes nom-bibtex.
The only feature of nom-bibtex
that we do not support is the capturing of comments not explicitly contained in a @comment
entry.
The functionality in this crate essentially supercedes bibparser.
The benchmark code can be find in benches/compare.rs
.
The bibliography file used is assets/tugboat.bib
, which is part of the testing data used by biber.
It is a 2.64 MB 73,993-line .bib
file.
ignore
: Deserialize using serde::de::IgnoredAny
to parse the file but ignore the contents.struct
: Deserialize using a struct with entries capturing every field present in assets/tugboat.bib
(15 fields total), expanding macros and collapsing field values.borrow
: Deserialize into a fully borrowed Rust type which captures all data in the file but does not expand macros or collapse field values.biblatex
: Parse using biblatex::RawBibliography::parse
(most similar to borrow
).copy
: Deserialize into an owned Rust type with macro expansion, field value collapsing, and case-insensitive comparison where appropriate.nom-bibtex
: Parse using nom-bibtex::Bibtex::parse
(most similar to copy
).The benchmarks were performed on an Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz (2019 MacBook Pro).
The speedup factor is relative to biblatex
.
benchmark | factor | runtime | throughput |
---|---|---|---|
ignore | 4.8x | [3.3923 ms 3.3987 ms 3.4058 ms] |
660 MB/s |
struct | 1.9x | [8.5496 ms 8.7481 ms 8.9924 ms] |
300 MB/s |
borrow | 1.3x | [12.932 ms 12.962 ms 12.992 ms] |
200 MB/s |
biblatex | 1.0x | [16.184 ms 16.224 ms 16.266 ms] |
160 MB/s |
copy | 0.75x | [21.455 ms 21.690 ms 21.935 ms] |
120 MB/s |
nom-bibtex | 0.23x | [71.607 ms 71.912 ms 72.343 ms] |
40 MB/s |
The bibparser crate is not included in this benchmark as it is unable to parse the input file.
This crate uses some unsafe
for string conversions when we can guarantee for other reasons that a string slice is at a valid codepoint.