rsrpp

Crates.iorsrpp
lib.rsrsrpp
version1.0.18
created_at2024-11-07 13:51:38.877688+00
updated_at2025-09-14 05:37:37.288348+00
descriptionA Rust project for research paper pdf.
homepage
repositoryhttps://github.com/akitenkrad/rsrpp.git
max_upload_size
id1439779
size161,583
akitenkrad (akitenkrad)

documentation

README

Rust Research Paper Parser (rsrpp)

CircleCI Crates.io Version

RuSt Research Paper Parser (rsrpp)

The rsrpp library provides a set of tools for parsing research papers.

LOGO

Quick Start

Pre-requirements

  • Poppler: sudo apt install poppler-utils
  • OpenCV: sudo apt install libopencv-dev clang libclang-dev

Installation

To start using the rsrpp library, add it to your project's dependencies in the Cargo.toml file:

cargo add rsrpp

Then, import the necessary modules in your code:

extern crate rsrpp;
use rsrpp::parser;

Examples

Here is a simple example of how to use the parser module:

let mut config = ParserConfig::new();
let url = "https://arxiv.org/pdf/1706.03762";
let pages = parse(url, &mut config).await.unwrap(); // Vec<Page>
let sections = Section::from_pages(&pages); // Vec<Section>
let json = serde_json::to_string(&sections).unwrap(); // String

Tests

The library includes a set of tests to ensure its functionality. To run the tests, use the following command:

cargo test

License: MIT

Releases

1.0.16
  • removed init_logger form rsrpp.
1.0.15
  • fixed typo.
  • introdeced tracing logger.
1.0.14
  • Updated rsrpp version for rsrpp-cli.
1.0.13
  • Updated dependencies.
  • removed build.sh because it requires sudo when installing the crate.
1.0.12
  • Fixed a bug: remove unused println!.
1.0.11
  • Fixed a bug in xml loop to finish when the file reaches to end.
1.0.10
  • Added verbose mode.
  • Fixed a bug in the process extracting page number.
1.0.9
  • Updated: implemented new errors to handle invalid URLs.
1.0.8
  • Updated: The max retry time for saving PDF files has been increased.
1.0.7
  • Fix bugs: After converting to PDF, the program now waits until processing is complete.
1.0.4
  • Fixed bugs in get_pdf_info.
  • Made minor improvements.
1.0.3
1.0.2
  • Updated the Section module. content: String was replaced by content: Vec<TextBlock>.
Commit count: 0

cargo fmt