| Crates.io | oar-ocr-vl |
| lib.rs | oar-ocr-vl |
| version | 0.6.0 |
| created_at | 2026-01-08 12:48:05.7683+00 |
| updated_at | 2026-01-25 07:16:09.560268+00 |
| description | Vision-Language models for oar-ocr |
| homepage | https://github.com/greatv/oar-ocr |
| repository | https://github.com/greatv/oar-ocr |
| max_upload_size | |
| id | 2030134 |
| size | 517,063 |
Vision-Language models for document understanding in Rust.
This crate provides PaddleOCR-VL, UniRec, HunyuanOCR, and LightOnOCR implementations using Candle for native Rust inference.
PaddleOCR-VL is an ultra-compact (0.9B parameters) Vision-Language Model for document parsing, released by Baidu's PaddlePaddle team. It supports 109 languages and excels in recognizing complex elements including text, tables, formulas, and 11 chart types.
UniRec is a unified recognition model with only 0.1B parameters, developed by the FVL Laboratory at Fudan University. It is designed for high-accuracy and efficient recognition of plain text, mathematical formulas, and mixed content in both Chinese and English.
HunyuanOCR is a 1B parameter OCR expert VLM powered by Hunyuan's multimodal architecture. This crate provides native Rust inference for the model_type=hunyuan_vl checkpoint.
LightOnOCR-2 is an efficient end-to-end OCR VLM for extracting clean text from document images without an external pipeline.
Two-stage document parsing API that combines layout detection (ONNX) with VL-based recognition, supporting UniRec, PaddleOCR-VL, HunyuanOCR, and LightOnOCR backends.
Add oar-ocr-vl to your project:
cargo add oar-ocr-vl
To enable GPU acceleration (CUDA), add the feature flag:
cargo add oar-ocr-vl --features cuda
Use PaddleOCR-VL to recognize a specific aspect of an image (e.g., just the table or text).
use oar_ocr_core::utils::load_image;
use oar_ocr_vl::{PaddleOcrVl, PaddleOcrVlTask};
let image = load_image("document.png")?;
let device = candle_core::Device::Cpu; // Or Device::new_cuda(0)?
// Initialize model
let model = PaddleOcrVl::from_dir("PaddleOCR-VL", device)?;
// Perform OCR
let result = model.generate(image, PaddleOcrVlTask::Ocr, 256)?;
println!("Result: {}", result);
UniRec is a unified model that handles text, mathematical formulas, and table structures in a single pass without needing task-specific prompts.
use oar_ocr_core::utils::load_image;
use oar_ocr_vl::UniRec;
let image = load_image("mixed_content.png")?;
let device = candle_core::Device::Cpu;
// Initialize model
let model = UniRec::from_dir("models/unirec-0.1b", device)?;
// Generate content (automatically handles text, formulas, etc.)
let result = model.generate(image, 512)?;
println!("Result: {}", result);
Combine layout detection with a VLM backend to parse an entire page into Markdown.
use oar_ocr_core::utils::load_image;
use oar_ocr_core::predictors::LayoutDetectionPredictor;
use oar_ocr_vl::{DocParser, UniRec};
let device = candle_core::Device::Cpu;
// 1. Setup Layout Detector
let layout_predictor = LayoutDetectionPredictor::builder()
.model_name("pp-doclayoutv2")
.build("pp-doclayoutv2.onnx")?;
// 2. Setup Recognition Backend (UniRec or PaddleOCR-VL)
let unirec = UniRec::from_dir("models/unirec-0.1b", device)?;
let parser = DocParser::new(&unirec);
// 3. Parse Document
let image = load_image("page.jpg")?;
let result = parser.parse(&layout_predictor, image)?;
// 4. Output as Markdown
println!("{}", result.to_markdown());
The oar-ocr-vl crate includes several examples demonstrating its capabilities.
This example combines layout detection (ONNX) with a VLM for recognition.
cargo run --release --features cuda --example doc_parser -- \
--model-name unirec \
--model-dir models/unirec-0.1b \
--layout-model models/pp-doclayoutv2.onnx \
--device cuda \
document.jpg
Run the UniRec model directly on an image.
cargo run --release --features cuda --example unirec -- \
--model-dir models/unirec-0.1b \
--device cuda \
formula.png
Run the PaddleOCR-VL model directly on an image with a specific task prompt.
# OCR task
cargo run --release --features cuda --example paddleocr_vl -- \
--model-dir PaddleOCR-VL \
--device cuda \
--task ocr \
document.jpg
# Table task
cargo run --release --features cuda --example paddleocr_vl -- \
--model-dir PaddleOCR-VL \
--device cuda \
--task table \
table.jpg
cargo run --release --features cuda --example hunyuanocr -- \
--model-dir ~/repos/HunyuanOCR \
--device cuda \
--prompt "Detect and recognize text in the image, and output the text coordinates in a formatted manner." \
document.jpg