| Crates.io | oxify-connect-vision |
| lib.rs | oxify-connect-vision |
| version | 0.1.0 |
| created_at | 2026-01-19 05:25:26.675384+00 |
| updated_at | 2026-01-19 05:25:26.675384+00 |
| description | Vision/OCR connector for OxiFY workflows |
| homepage | |
| repository | https://github.com/cool-japan/oxify |
| max_upload_size | |
| id | 2053778 |
| size | 674,450 |
🔍 Vision/OCR connector for OxiFY workflow automation engine
High-performance OCR (Optical Character Recognition) library supporting multiple backends with GPU acceleration, async processing, and comprehensive output formats. Designed for production workflows requiring reliable document processing at scale.
| Provider | Backend | GPU | Languages | Quality | Setup |
|---|---|---|---|---|---|
| Mock | In-memory | ❌ | Any | Low | None |
| Tesseract | leptess | ❌ | 100+ | Medium | System package |
| Surya | ONNX Runtime | ✅ | 6+ | High | ONNX models |
| PaddleOCR | ONNX Runtime | ✅ | 80+ | High | ONNX models |
Add to your Cargo.toml:
[dependencies]
oxify-connect-vision = { path = "../oxify-connect-vision" }
[dependencies]
oxify-connect-vision = {
path = "../oxify-connect-vision",
features = ["mock", "tesseract"]
}
[dependencies]
oxify-connect-vision = {
path = "../oxify-connect-vision",
features = ["mock", "tesseract", "surya", "paddle", "cuda"]
}
| Feature | Description | Dependencies |
|---|---|---|
mock |
Mock provider | None (default) |
tesseract |
Tesseract OCR | leptess, tesseract-sys |
surya |
Surya ONNX | ort |
paddle |
PaddleOCR ONNX | ort, ndarray |
onnx |
ONNX Runtime base | ort |
cuda |
CUDA GPU support | CUDA toolkit |
coreml |
CoreML (macOS) | CoreML |
use oxify_connect_vision::{create_provider, VisionProviderConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create provider
let config = VisionProviderConfig::mock();
let provider = create_provider(&config)?;
// Load model (idempotent)
provider.load_model().await?;
// Process image
let image_data = std::fs::read("document.png")?;
let result = provider.process_image(&image_data).await?;
println!("📄 Text: {}", result.text);
println!("📝 Markdown:\n{}", result.markdown);
println!("📊 Blocks: {}", result.blocks.len());
Ok(())
}
use oxify_connect_vision::{create_provider, VisionProviderConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Configure Tesseract for Japanese
let config = VisionProviderConfig::tesseract(Some("jpn"));
let provider = create_provider(&config)?;
provider.load_model().await?;
// Process image
let image_data = std::fs::read("japanese_doc.png")?;
let result = provider.process_image(&image_data).await?;
// Access structured results
for block in &result.blocks {
println!(
"🔤 {} (role: {}, confidence: {:.2}%)",
block.text,
block.role,
block.confidence * 100.0
);
}
Ok(())
}
use oxify_connect_vision::{create_provider, VisionProviderConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Configure Surya with GPU
let config = VisionProviderConfig::surya(
"/path/to/models", // Model directory
true // Enable GPU
);
let provider = create_provider(&config)?;
provider.load_model().await?;
let image_data = std::fs::read("complex_layout.png")?;
let start = std::time::Instant::now();
let result = provider.process_image(&image_data).await?;
let duration = start.elapsed();
println!("⚡ Processed in {:?}", duration);
println!("📊 Found {} text blocks", result.blocks.len());
Ok(())
}
use oxify_connect_vision::{VisionCache, create_provider, VisionProviderConfig};
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create cache
let mut cache = VisionCache::new();
cache.set_max_entries(1000);
cache.set_ttl(Duration::from_secs(3600));
let provider = create_provider(&VisionProviderConfig::mock())?;
provider.load_model().await?;
let image_data = std::fs::read("document.png")?;
let cache_key = format!("doc_{}", compute_hash(&image_data));
// Check cache first
let result = if let Some(cached) = cache.get(&cache_key) {
println!("💾 Cache hit!");
cached
} else {
println!("🔄 Processing image...");
let result = provider.process_image(&image_data).await?;
cache.put(cache_key.clone(), result.clone());
result
};
Ok(())
}
fn compute_hash(data: &[u8]) -> String {
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
let mut hasher = DefaultHasher::new();
data.hash(&mut hasher);
format!("{:x}", hasher.finish())
}
The Oxify CLI provides convenient commands for OCR operations:
# List available providers
oxify vision list
# Process an image with specific provider
oxify vision process document.png \
--provider tesseract \
--format markdown \
--output output.md
# Process with language specification
oxify vision process japanese.png \
--provider tesseract \
--language jpn
# Get detailed provider information
oxify vision info surya
# Benchmark multiple providers
oxify vision benchmark test.png \
--providers tesseract,surya,paddle \
--iterations 10
# Extract structured data
oxify vision extract receipt.png \
--data-type receipt \
--provider paddle
{
"nodes": [
{
"id": "ocr-node",
"name": "Document OCR",
"kind": {
"type": "Vision",
"config": {
"provider": "surya",
"model_path": "/models/surya",
"output_format": "markdown",
"use_gpu": true,
"language": "en",
"image_input": "{{input.document_image}}"
}
}
}
]
}
{
"nodes": [
{
"id": "ocr",
"name": "Extract Text",
"kind": {
"type": "Vision",
"config": {
"provider": "tesseract",
"image_input": "{{input.image}}"
}
}
},
{
"id": "analyze",
"name": "Analyze Content",
"kind": {
"type": "LLM",
"config": {
"provider": "openai",
"model": "gpt-4",
"prompt_template": "Analyze this document:\n\n{{ocr.markdown}}"
}
}
}
],
"edges": [
{"from": "ocr", "to": "analyze"}
]
}
Simple text extraction with whitespace preservation.
Suitable for full-text search and basic NLP.
# Document Title
## Section Header
Regular text content with **formatting** preserved.
- List item 1
- List item 2
| Column 1 | Column 2 |
|----------|----------|
| Data 1 | Data 2 |
{
"text": "Full document text...",
"markdown": "# Document Title\n\n...",
"blocks": [
{
"text": "Document Title",
"bbox": [0.1, 0.1, 0.9, 0.2],
"confidence": 0.98,
"role": "Title"
}
],
"metadata": {
"provider": "surya",
"processing_time_ms": 145,
"image_width": 1920,
"image_height": 1080,
"languages": ["en"],
"page_count": 1
}
}
Ubuntu/Debian:
sudo apt update
sudo apt install tesseract-ocr tesseract-ocr-eng tesseract-ocr-jpn
macOS:
brew install tesseract
brew install tesseract-lang # Additional languages
Windows: Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
Verify:
tesseract --version
tesseract --list-langs
models/surya/
├── detection.onnx
└── recognition.onnx
VisionProviderConfig::surya("/path/to/models/surya", false)
models/paddle/
├── det.onnx # Detection model
├── rec.onnx # Recognition model
└── cls.onnx # Classification model
VisionProviderConfig::paddle("/path/to/models/paddle", true)
Tested on: AMD Ryzen 9 5950X, NVIDIA RTX 3090, 1920x1080 images
| Provider | CPU Time | GPU Time | Memory | Accuracy* |
|---|---|---|---|---|
| Mock | <1ms | - | <1MB | N/A |
| Tesseract | 450ms | - | ~200MB | 85% |
| Surya | 320ms | 45ms | ~1.5GB | 92% |
| PaddleOCR | 380ms | 55ms | ~1.8GB | 90% |
*Accuracy measured on standard document dataset
use oxify_connect_vision::{VisionError, create_provider, VisionProviderConfig};
async fn safe_ocr(image: &[u8]) -> Result<String, String> {
let config = VisionProviderConfig::tesseract(None);
let provider = create_provider(&config)
.map_err(|e| format!("Provider creation failed: {}", e))?;
provider.load_model().await
.map_err(|e| format!("Model loading failed: {}", e))?;
match provider.process_image(image).await {
Ok(result) => Ok(result.text),
Err(VisionError::InvalidImage(msg)) => {
Err(format!("Invalid image: {}", msg))
}
Err(VisionError::ProcessingFailed(msg)) => {
Err(format!("Processing failed: {}", msg))
}
Err(e) => Err(format!("Unknown error: {}", e))
}
}
Run tests:
# Unit tests (mock provider)
cargo test
# Integration tests (requires setup)
cargo test --features tesseract --ignored
# All tests with coverage
cargo test --all-features
Example test:
#[tokio::test]
async fn test_mock_ocr() {
let config = VisionProviderConfig::mock();
let provider = create_provider(&config).unwrap();
provider.load_model().await.unwrap();
let result = provider.process_image(b"test").await.unwrap();
assert!(!result.text.is_empty());
assert_eq!(result.metadata.provider, "mock");
}
// Always call load_model() before processing
provider.load_model().await?;
cargo tree | grep ortcache.set_max_entries(100)We welcome contributions! Areas of interest:
See TODO.md for planned enhancements.
Apache-2.0 - See LICENSE file in the root directory.
Built with ❤️ for the Oxify workflow automation platform