biblib

Crates.iobiblib
lib.rsbiblib
version0.3.2
created_at2025-01-25 10:39:19.861905+00
updated_at2025-12-30 07:27:24.307285+00
descriptionParse, manage, and deduplicate academic citations
homepage
repositoryhttps://github.com/AliAzlanDev/biblib
max_upload_size
id1530494
size338,200
Ali Azlan (AliAzlanDev)

documentation

https://docs.rs/biblib

README

biblib

Crates.io Documentation License

A Rust library for parsing and deduplicating academic citations.

Installation

[dependencies]
biblib = "0.3.0"

For minimal builds:

[dependencies]
biblib = { version = "0.3.0", default-features = false, features = ["ris"] }

Supported Formats

Format Feature Description
RIS ris Research Information Systems format
PubMed pubmed MEDLINE/PubMed .nbib files
EndNote XML xml EndNote XML export format
CSV csv Configurable delimited files

All format features are enabled by default.

Quick Start

Parsing Citations

use biblib::{CitationParser, RisParser};

let ris_content = r#"TY  - JOUR
TI  - Machine Learning in Healthcare
AU  - Smith, John
AU  - Doe, Jane
PY  - 2023
ER  -"#;

let parser = RisParser::new();
let citations = parser.parse(ris_content).unwrap();

println!("Title: {}", citations[0].title);
println!("Authors: {:?}", citations[0].authors);

Auto-Detecting Format

use biblib::detect_and_parse;

let content = "TY  - JOUR\nTI  - Example\nER  -";
let (citations, format) = detect_and_parse(content).unwrap();

println!("Detected format: {}", format); // "RIS"

Deduplicating Citations

use biblib::dedupe::{Deduplicator, DeduplicatorConfig};

let config = DeduplicatorConfig {
    group_by_year: true,      // Group by year for performance
    run_in_parallel: true,    // Use parallel processing
    source_preferences: vec!["PubMed".to_string()], // Prefer PubMed records
};

let deduplicator = Deduplicator::new().with_config(config);
let groups = deduplicator.find_duplicates(&citations).unwrap();

for group in groups {
    if !group.duplicates.is_empty() {
        println!("Kept: {}", group.unique.title);
        println!("Duplicates: {}", group.duplicates.len());
    }
}

CSV with Custom Headers

use biblib::csv::{CsvParser, CsvConfig};
use biblib::CitationParser;

let mut config = CsvConfig::new();
config
    .set_delimiter(b';')
    .set_header_mapping("title", vec!["Article Name".to_string()])
    .set_header_mapping("authors", vec!["Writers".to_string()]);

let parser = CsvParser::with_config(config);
let citations = parser.parse("Article Name;Writers\nMy Paper;Smith J").unwrap();

Citation Fields

Each parsed citation contains:

Field Type Description
title String Work title
authors Vec<Author> Authors with name, given name, affiliations
journal Option<String> Full journal name
journal_abbr Option<String> Journal abbreviation
date Option<Date> Year, month, day
volume Option<String> Volume number
issue Option<String> Issue number
pages Option<String> Page range
doi Option<String> Digital Object Identifier
pmid Option<String> PubMed ID
pmc_id Option<String> PubMed Central ID
issn Vec<String> ISSNs
abstract_text Option<String> Abstract
keywords Vec<String> Keywords
urls Vec<String> Related URLs
mesh_terms Vec<String> MeSH terms (PubMed)
extra_fields HashMap Additional format-specific fields

Features

Feature Dependencies Description
ris - RIS format parser
pubmed - PubMed/MEDLINE parser
xml quick-xml EndNote XML parser
csv csv CSV parser
dedupe rayon, strsim Deduplication engine
regex regex Full regex support
lite regex-lite Lightweight regex (smaller binary)

Default: all features enabled except lite.

Documentation

  • Parsing Guide — Format-specific tag mappings, date formats, and author handling
  • Deduplication Guide — Matching algorithm, similarity thresholds, and configuration
  • API Docs — Complete API reference

License

Licensed under either of Apache License 2.0 or MIT at your option.

Commit count: 0

cargo fmt