| Crates.io | ragzilla-parsing |
| lib.rs | ragzilla-parsing |
| version | 0.1.0 |
| created_at | 2025-03-21 20:39:43.526543+00 |
| updated_at | 2025-03-21 20:39:43.526543+00 |
| description | A Rust library for parsing PDFs using the Mistral AI OCR API |
| homepage | |
| repository | https://github.com/excoffierleonard/ragzilla |
| max_upload_size | |
| id | 1601139 |
| size | 43,562 |
A Rust library for parsing PDFs using the Mistral AI OCR API.
Add the following to your Cargo.toml file:
[dependencies]
ragzilla-parsing = "0.1.0"
use ragzilla_parsing::parse_pdf;
#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
let api_key = std::env::var("MISTRAL_API_KEY").expect("MISTRAL_API_KEY must be set");
let document_url = "https://arxiv.org/pdf/2201.04234";
let chunks = parse_pdf(document_url, &api_key).await?;
println!("Parsed {} pages from PDF", chunks.len());
Ok(())
}
parse_pdfpub async fn parse_pdf(document_url: &str, api_key: &str) -> Result<Vec<String>, reqwest::Error>
Parses a PDF document from a URL and converts it to markdown text using Mistral AI's OCR service.
document_url: URL to the PDF document to be parsedapi_key: Your Mistral AI API keyReturns a Result containing a vector of strings, where each string represents the markdown content of a page.
MIT License