pdf-rs

Crates.iopdf-rs
lib.rspdf-rs
version0.1.7-dev
created_at2025-12-14 08:49:59.572027+00
updated_at2026-01-14 14:17:02.993995+00
descriptionA PDF parsing library written in Rust
homepagehttps://github.com/LeoPlus1024/pdf-rs
repositoryhttps://github.com/LeoPlus1024/pdf-rs
max_upload_size
id1984034
size158,316
Bojun Yang (LeoPlus1024)

documentation

https://docs.rs/pdf-rs

README

pdf-rs

⚠️ Warning: This project is currently under active development. APIs may change frequently and are not yet stable.

Build Status License Crates.io

English | 中文

A PDF parsing library written in Rust.

Overview

pdf-rs is a Rust library for parsing PDF files. The project aims to provide parsing functionality for PDF document structures, including:

  • PDF version identification
  • Cross-reference table (xref) parsing
  • Object parsing (dictionaries, arrays, strings, etc.)
  • Basic access to PDF structures

Features

  1. PDF Version Support: Supports PDF versions from 1.0 to 2.0
  2. Object Parsing: Parses various object types in PDF, including dictionaries, arrays, strings, etc.
  3. Cross-reference Table Parsing: Parses PDF's xref table to locate objects
  4. Stream Reading: Uses Sequence trait for efficient streaming file reading
  5. Memory Efficiency: Designed to minimize memory usage during parsing
  6. Error Handling: Comprehensive error handling with detailed error messages
  7. Type Safety: Fully utilizes Rust's type system for safety guarantees

Installation

Add this to your Cargo.toml:

[dependencies]
pdf-rs = "0.1"

Usage Example

use std::path::PathBuf;
use pdf_rs::document::PDFDocument;

// Create PDF document parser
let path = PathBuf::from("example.pdf");
let document = PDFDocument::open(path)?;

// Access PDF version
println!("PDF Version: {}", document.get_version());

// Get cross-reference table
let xrefs = document.get_xref();
println!("XRef entries: {}", xrefs.len());

API Documentation

For detailed API documentation, please refer to the crate documentation.

Module Structure

  • document: Main PDF document parsing functionality
  • objects: PDF object representations (dictionaries, arrays, strings, etc.)
  • parser: Core parsing logic for PDF objects
  • sequence: Streaming file reading utilities
  • tokenizer: Tokenization of PDF content
  • error: Error types and handling

Design Highlights

  • Modular Design: Different functionalities separated into different modules for easy maintenance and extension
  • Error Handling: Comprehensive error type system providing detailed error information
  • Memory Efficiency: Streaming reading avoids loading entire files into memory
  • Type Safety: Fully utilizes Rust's type system for safety guarantees
  • Extensibility: Designed with extensibility in mind for future enhancements

Current Status

The project is in early development stage. Basic PDF parsing functionality has been implemented, including version detection, xref table parsing, and basic object parsing.

Future Plans

  • Improve PDF object parsing functionality
  • Add encrypted PDF support
  • Implement advanced PDF content extraction features
  • Provide more user-friendly API interfaces
  • Add comprehensive documentation with examples
  • Improve performance and memory usage

Build Requirements

  • Rust 1.5+ (latest stable version recommended)

Build Steps

cargo build

Run Tests

cargo test

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository

  2. Create your feature branch (git checkout -b feature/AmazingFeature)

  3. Commit your changes (git commit -m 'Add some AmazingFeature')

  4. Push to the branch (git push origin feature/AmazingFeature)

  5. Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

  • Thanks to the Rust community for providing excellent tools and libraries
  • Inspired by other PDF parsing libraries in different languages
Commit count: 72

cargo fmt