Crates.io | hayro-syntax |
lib.rs | hayro-syntax |
version | 0.3.0 |
created_at | 2025-06-08 10:32:13.229526+00 |
updated_at | 2025-09-09 12:40:21.708695+00 |
description | A low-level crate for reading PDF files. |
homepage | |
repository | https://github.com/LaurenzV/hayro |
max_upload_size | |
id | 1704760 |
size | 510,694 |
A low-level library for reading PDF files.
This crate implements the Syntax
chapter of the PDF reference, and therefore
serves as a very good basis for building various abstractions on top of it, without having to reimplement
the PDF parsing logic.
This crate does not provide more high-level functionality, such as parsing fonts or color spaces.
Such functionality is out-of-scope for hayro-syntax
, since this crate is supposed to be
as light-weight and application-agnostic as possible.
Functionality-wise, this crate is therefore close to feature-complete. The main missing feature is support for encrypted and password-protected documents, as well as improved support for JPEG2000 documents. In addition to that, more low-level APIs might be added in the future.
This short example shows you how to load a PDF file and iterate over the content streams of all pages.
use hayro_syntax::Pdf;
use std::path::PathBuf;
use std::sync::Arc;
// First load the data that constitutes the PDF file.
let data = std::fs::read(
PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../hayro/pdfs/text_with_rise.pdf"),
)
.unwrap();
// Then create a new PDF file from it.
//
// Here we are just unwrapping in case reading the file failed, but you
// might instead want to apply proper error handling.
let pdf = Pdf::new(Arc::new(data)).unwrap();
// First access all pages, and then iterate over the operators of each page's
// content stream and print them.
let pages = pdf.pages();
for page in pages.iter() {
for op in page.typed_operations() {
println!("{op:?}");
}
}
There is one usage of unsafe
, needed to implement caching using a self-referential struct. Other
than that, there is no usage of unsafe
, especially in any of the parser code.
The supported features include:
jpeg2000
feature (see further below for more information).This crate has one feature, jpeg2000
. PDF allows for the insertion of JPEG2000 images. However,
unfortunately, JPEG2000 is a very complicated format. There exists a Rust
jpeg2k crate that allows decoding such images. However, it is a
relatively heavy dependency, has a lot of unsafe code (due to having been ported with
c2rust), and also has a dependency on libc, meaning that you might be
restricted in the targets you can build to. Because of this, I recommend not enabling this feature
unless you absolutely need to be able to support such images.
This crate is available under the Apache 2.0 license.