oar-ocr-vl

Crates.iooar-ocr-vl
lib.rsoar-ocr-vl
version0.6.0
created_at2026-01-08 12:48:05.7683+00
updated_at2026-01-25 07:16:09.560268+00
descriptionVision-Language models for oar-ocr
homepagehttps://github.com/greatv/oar-ocr
repositoryhttps://github.com/greatv/oar-ocr
max_upload_size
id2030134
size517,063
Wang Xin (GreatV)

documentation

README

oar-ocr-vl

Vision-Language models for document understanding in Rust.

This crate provides PaddleOCR-VL, UniRec, HunyuanOCR, and LightOnOCR implementations using Candle for native Rust inference.

Supported Models

PaddleOCR-VL

PaddleOCR-VL is an ultra-compact (0.9B parameters) Vision-Language Model for document parsing, released by Baidu's PaddlePaddle team. It supports 109 languages and excels in recognizing complex elements including text, tables, formulas, and 11 chart types.

UniRec

UniRec is a unified recognition model with only 0.1B parameters, developed by the FVL Laboratory at Fudan University. It is designed for high-accuracy and efficient recognition of plain text, mathematical formulas, and mixed content in both Chinese and English.

HunyuanOCR

HunyuanOCR is a 1B parameter OCR expert VLM powered by Hunyuan's multimodal architecture. This crate provides native Rust inference for the model_type=hunyuan_vl checkpoint.

LightOnOCR

LightOnOCR-2 is an efficient end-to-end OCR VLM for extracting clean text from document images without an external pipeline.

DocParser

Two-stage document parsing API that combines layout detection (ONNX) with VL-based recognition, supporting UniRec, PaddleOCR-VL, HunyuanOCR, and LightOnOCR backends.

Installation

Add oar-ocr-vl to your project:

cargo add oar-ocr-vl

To enable GPU acceleration (CUDA), add the feature flag:

cargo add oar-ocr-vl --features cuda

Usage

PaddleOCR-VL

Use PaddleOCR-VL to recognize a specific aspect of an image (e.g., just the table or text).

use oar_ocr_core::utils::load_image;
use oar_ocr_vl::{PaddleOcrVl, PaddleOcrVlTask};

let image = load_image("document.png")?;
let device = candle_core::Device::Cpu; // Or Device::new_cuda(0)?

// Initialize model
let model = PaddleOcrVl::from_dir("PaddleOCR-VL", device)?;

// Perform OCR
let result = model.generate(image, PaddleOcrVlTask::Ocr, 256)?;
println!("Result: {}", result);

UniRec

UniRec is a unified model that handles text, mathematical formulas, and table structures in a single pass without needing task-specific prompts.

use oar_ocr_core::utils::load_image;
use oar_ocr_vl::UniRec;

let image = load_image("mixed_content.png")?;
let device = candle_core::Device::Cpu;

// Initialize model
let model = UniRec::from_dir("models/unirec-0.1b", device)?;

// Generate content (automatically handles text, formulas, etc.)
let result = model.generate(image, 512)?;
println!("Result: {}", result);

DocParser

Combine layout detection with a VLM backend to parse an entire page into Markdown.

use oar_ocr_core::utils::load_image;
use oar_ocr_core::predictors::LayoutDetectionPredictor;
use oar_ocr_vl::{DocParser, UniRec};

let device = candle_core::Device::Cpu;

// 1. Setup Layout Detector
let layout_predictor = LayoutDetectionPredictor::builder()
    .model_name("pp-doclayoutv2")
    .build("pp-doclayoutv2.onnx")?;

// 2. Setup Recognition Backend (UniRec or PaddleOCR-VL)
let unirec = UniRec::from_dir("models/unirec-0.1b", device)?;
let parser = DocParser::new(&unirec);

// 3. Parse Document
let image = load_image("page.jpg")?;
let result = parser.parse(&layout_predictor, image)?;

// 4. Output as Markdown
println!("{}", result.to_markdown());

Running Examples

The oar-ocr-vl crate includes several examples demonstrating its capabilities.

DocParser (Two-Stage Pipeline)

This example combines layout detection (ONNX) with a VLM for recognition.

cargo run --release --features cuda --example doc_parser -- \
    --model-name unirec \
    --model-dir models/unirec-0.1b \
    --layout-model models/pp-doclayoutv2.onnx \
    --device cuda \
    document.jpg

UniRec (Direct Inference)

Run the UniRec model directly on an image.

cargo run --release --features cuda --example unirec -- \
    --model-dir models/unirec-0.1b \
    --device cuda \
    formula.png

PaddleOCR-VL (Direct Inference)

Run the PaddleOCR-VL model directly on an image with a specific task prompt.

# OCR task
cargo run --release --features cuda --example paddleocr_vl -- \
    --model-dir PaddleOCR-VL \
    --device cuda \
    --task ocr \
    document.jpg

# Table task
cargo run --release --features cuda --example paddleocr_vl -- \
    --model-dir PaddleOCR-VL \
    --device cuda \
    --task table \
    table.jpg

HunyuanOCR (Direct Inference)

cargo run --release --features cuda --example hunyuanocr -- \
    --model-dir ~/repos/HunyuanOCR \
    --device cuda \
    --prompt "Detect and recognize text in the image, and output the text coordinates in a formatted manner." \
    document.jpg
Commit count: 75

cargo fmt