| Crates.io | kreuzberg |
| lib.rs | kreuzberg |
| version | 4.1.2 |
| created_at | 2025-12-09 09:00:58.072831+00 |
| updated_at | 2026-01-25 12:36:35.258038+00 |
| description | High-performance document intelligence library for Rust. Extract text, metadata, and structured data from PDFs, Office documents, images, and 50+ formats with async/sync APIs. |
| homepage | https://kreuzberg.dev |
| repository | https://github.com/kreuzberg-dev/kreuzberg |
| max_upload_size | |
| id | 1975144 |
| size | 4,047,744 |
High-performance document intelligence library for Rust. Extract text, metadata, and structured information from PDFs, Office documents, images, and 56 formats.
This is the core Rust library that powers the Python, TypeScript, and Ruby bindings.
🚀 Version 4.1.2 Release This is a pre-release version. We invite you to test the library and report any issues you encounter.
Note: The Rust crate is not currently published to crates.io for this RC. Use git dependencies or language bindings (Python, TypeScript, Ruby) instead.
[dependencies]
kreuzberg = "4.0"
tokio = { version = "1", features = ["rt", "macros"] }
Kreuzberg offers flexible PDFium linking strategies for different deployment scenarios. Note: Language bindings (Python, TypeScript, Ruby, Java, Go, C#, PHP, Elixir) automatically bundle PDFium—no configuration needed. This section applies only to the Rust crate.
| Strategy | Feature Flag | Description | Use Case |
|---|---|---|---|
| Default (Dynamic) | None | Links to system PDFium at runtime | Development, system package users |
| Static | pdf-static |
Statically links PDFium into binary | Single binary distribution, no runtime dependencies |
| Bundled | pdf-bundled |
Downloads and embeds PDFium in binary | CI/CD, hermetic builds, largest binary size |
| System | pdf-system |
Uses system PDFium via pkg-config | Linux distributions with PDFium package |
Example Cargo.toml configurations:
# Default (dynamic linking)
[dependencies]
kreuzberg = "4.0"
# Static linking
[dependencies]
kreuzberg = { version = "4.0", features = ["pdf-static"] }
# Bundled in binary
[dependencies]
kreuzberg = { version = "4.0", features = ["pdf-bundled"] }
# System library (requires PDFium installed)
[dependencies]
kreuzberg = { version = "4.0", features = ["pdf-system"] }
For more details on feature flags and configuration options, see the Features documentation.
If using embeddings functionality, ONNX Runtime must be installed:
# macOS
brew install onnxruntime
# Ubuntu/Debian
sudo apt install libonnxruntime libonnxruntime-dev
# Windows (MSVC)
scoop install onnxruntime
# OR download from https://github.com/microsoft/onnxruntime/releases
Without ONNX Runtime, embeddings will raise MissingDependencyError with installation instructions.
use kreuzberg::{extract_file_sync, ExtractionConfig};
fn main() -> kreuzberg::Result<()> {
let config = ExtractionConfig::default();
let result = extract_file_sync("document.pdf", None, &config)?;
println!("{}", result.content);
Ok(())
}
use kreuzberg::{extract_file, ExtractionConfig};
#[tokio::main]
async fn main() -> kreuzberg::Result<()> {
let config = ExtractionConfig::default();
let result = extract_file("document.pdf", None, &config).await?;
println!("{}", result.content);
Ok(())
}
use kreuzberg::{batch_extract_file, ExtractionConfig};
#[tokio::main]
async fn main() -> kreuzberg::Result<()> {
let config = ExtractionConfig::default();
let files = vec!["doc1.pdf", "doc2.pdf", "doc3.pdf"];
let results = batch_extract_file(&files, None, &config).await?;
for result in results {
println!("{}", result.content);
}
Ok(())
}
use kreuzberg::{extract_file_sync, ExtractionConfig, OcrConfig, TesseractConfig};
fn main() -> kreuzberg::Result<()> {
let config = ExtractionConfig {
ocr: Some(OcrConfig {
backend: "tesseract".to_string(),
language: "eng".to_string(),
tesseract_config: Some(TesseractConfig {
enable_table_detection: true,
..Default::default()
}),
}),
..Default::default()
};
let result = extract_file_sync("invoice.pdf", None, &config)?;
for table in &result.tables {
println!("{}", table.markdown);
}
Ok(())
}
use kreuzberg::{extract_file_sync, ExtractionConfig, PdfConfig};
fn main() -> kreuzberg::Result<()> {
let config = ExtractionConfig {
pdf_options: Some(PdfConfig {
passwords: Some(vec!["password1".to_string(), "password2".to_string()]),
..Default::default()
}),
..Default::default()
};
let result = extract_file_sync("protected.pdf", None, &config)?;
Ok(())
}
use kreuzberg::{extract_bytes_sync, ExtractionConfig};
use std::fs;
fn main() -> kreuzberg::Result<()> {
let data = fs::read("document.pdf")?;
let config = ExtractionConfig::default();
let result = extract_bytes_sync(&data, "application/pdf", &config)?;
println!("{}", result.content);
Ok(())
}
The crate uses feature flags for optional functionality:
[dependencies]
kreuzberg = { version = "4.0", features = ["pdf", "excel", "ocr"] }
| Feature | Description | Binary Size |
|---|---|---|
pdf |
PDF extraction via pdfium | +25MB |
excel |
Excel/spreadsheet parsing | +3MB |
office |
DOCX, PPTX extraction | +1MB |
email |
EML, MSG extraction | +500KB |
html |
HTML to markdown | +1MB |
xml |
XML streaming parser | +500KB |
archives |
ZIP, TAR, 7Z extraction | +2MB |
ocr |
OCR with Tesseract | +5MB |
language-detection |
Language detection | +100KB |
chunking |
Text chunking | +200KB |
quality |
Text quality processing | +500KB |
kreuzberg = { version = "4.0", features = ["full"] }
kreuzberg = { version = "4.0", features = ["server"] }
kreuzberg = { version = "4.0", features = ["cli"] }
Kreuzberg supports three PDFium linking strategies. Default is bundled-pdfium (best developer experience).
| Strategy | Feature | Use Case | Binary Size | Runtime Deps |
|---|---|---|---|---|
| Bundled (default) | bundled-pdfium |
Development, production | +8-15MB | None |
| Static | static-pdfium |
Docker, musl, standalone binaries | +200MB | None |
| System | system-pdfium |
Package managers, distros | +2MB | libpdfium.so |
# Default - bundled PDFium (recommended)
[dependencies]
kreuzberg = "4.0"
# Static linking (Docker, musl)
[dependencies]
kreuzberg = { version = "4.0", features = ["static-pdfium"] }
# System PDFium (package managers)
[dependencies]
kreuzberg = { version = "4.0", features = ["system-pdfium"] }
For detailed information, see the PDFium Linking Guide.
Note: Language bindings (Python, TypeScript, Ruby, Java, Go) automatically bundle PDFium. No configuration needed.
API Documentation – Complete API reference with examples
https://docs.kreuzberg.dev – User guide and tutorials
MIT License - see LICENSE for details.