| Crates.io | optical-embeddings |
| lib.rs | optical-embeddings |
| version | 0.3.0 |
| created_at | 2025-10-23 22:22:16.196574+00 |
| updated_at | 2025-10-23 22:22:16.196574+00 |
| description | DeepSeek-OCR - compress text into images |
| homepage | https://github.com/tuned-org-uk/optical-embeddings-rs |
| repository | https://github.com/tuned-org-uk/optical-embeddings-rs |
| max_upload_size | |
| id | 1897747 |
| size | 385,386 |
A Rust implementation of DeepSeek-OCR, a vision-language model that compresses long text documents through optical encoding using the Burn deep learning framework.
This implementation is based on the paper:
DeepSeek-OCR: Contexts Optical Compression Haoran Wei, Yaofeng Sun, Yukun Li DeepSeek-AI arXiv:2510.18234v1 [cs.CV] 21 Oct 2024 Paper (arXiv) | Official Repository
DeepSeek-OCR addresses the computational challenges of processing long textual contexts in Large Language Models (LLMs) by leveraging context optical compressionβa novel approach that treats rendered text images as an efficient compression medium. Instead of processing thousands of text tokens, the model encodes document images into a compact set of vision tokens.
The system consists of two main components:
According to the paper's findings on the Fox benchmark:
The model supports multiple resolution modes optimized for different compression ratios:
| Mode | Resolution | Vision Tokens | Compression Target |
|---|---|---|---|
| Tiny | 512Γ512 | 64 | Ultra-fast inference |
| Small | 640Γ640 | 100 | Balanced performance |
| Base | 1024Γ1024 | 256 | Default (10Γ target) |
| Large | 1280Γ1280 | 400 | High-precision OCR |
| Gundam | Dynamic | <800 | Complex documents |
This Rust implementation provides:
assets/font.ttfcargo run --releasecargo test --all-features -- --nocapture
CPU only (default):
cargo build --release
cargo run --release
With WGPU (GPU - works with NVIDIA, AMD, Intel, Apple Silicon):
cargo build --release --features wgpu
cargo run --release --features wgpu
With CUDA (NVIDIA only - fastest):
# Make sure CUDA toolkit is installed first:
# Ubuntu/Debian: sudo apt install nvidia-cuda-toolkit
# Or download from: https://developer.nvidia.com/cuda-downloads
cargo build --release --features cuda
cargo run --release --features cuda
Check which GPU you have:
# NVIDIA
nvidia-smi
# AMD/Intel/General
vulkaninfo | grep -i "device name"
# Or just try WGPU (works with most GPUs)
cargo run --release --features wgpu
Try: cargo test -- test_information_compression_pipeline --nocapture
Information compression:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Optical Embeddings Information Analysis β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π TEXT INFORMATION:
ββ Bytes: 641
ββ Characters: 641
ββ Words: 87
ββ Unique chars: 42
ββ Entropy (bits): 4.3794
πΌοΈ IMAGE INFORMATION:
ββ Bytes: 786432
ββ Pixels: 262144
ββ Unique colors: 2
ββ Entropy (bits): 0.1507
π― VISION TOKENS:
ββ Token count: 64
ββ Embedding dim: 1024
ββ Total values: 65536
π COMPRESSION METRICS:
ββ TextβImage: 0.0008Γ (smaller)
ββ TextβTokens: 0.0098Γ (smaller)
ββ ImageβTokens: 12.0000Γ (compressed)
ββ Effective (ent): 39.5030Γ (compression)
π INFORMATION FLOW:
Original text: 641 bytes (4.3793884228266 bits entropy)
Rendered image: 786432 bytes (0.15070330510950625 bits entropy)
Vision tokens: 64 tokens Γ 1024 dims = 65536 values
Effective rate: 80.12 bits/token
π COMPRESSION RESULTS:
ββ Text words: 87
ββ Vision tokens: 64
ββ Words/token: 1.36
ββ Spatial compression: 1024 patches β 64 tokens = 16.0Γ reduction
β
Compression test passed!
- Achieved 16Γ spatial compression (1024 β 64 tokens)
- Word-to-token ratio: 1.36
- β
Effective compression: 1.36 words per vision token
test tests::tests::test_information_compression_pipeline ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 7 filtered out; finished in 1.28s