hocr-parser

Crates.iohocr-parser
lib.rshocr-parser
version0.1.0
sourcesrc
created_at2024-05-14 10:10:07.289511
updated_at2024-05-14 10:10:07.289511
descriptionA parser for the hOCR format
homepagehttps://github.com/styrowolf/hocr-parser
repositoryhttps://github.com/styrowolf/hocr-parser
max_upload_size
id1239445
size39,311
Oğuz Kurt (styrowolf)

documentation

README

hocr-parser

A parser for the hOCR format, "an open standard for representing document layout analysis and OCR results as a subset of HTML."

Design

This parser uses roxmltree to parse the XHTML. It simplifies provides easy access to the hOCR data embedded through the HOCR and Element structs, as well as their "borrowed" counterparts to prevent allocating for property names.

The parser does not validate if the file adheres to the hOCR specification. It checks required metadata and validity of hOCR element and property names but does not check property values.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Commit count: 5

cargo fmt