Crates.io | docx-parser |
lib.rs | docx-parser |
version | 0.1.1 |
source | src |
created_at | 2024-05-21 13:33:09.801552 |
updated_at | 2024-05-21 13:33:09.801552 |
description | Parse Word and OpenOffice DOCX files, and output markdown or JSON |
homepage | |
repository | https://github.com/erikvullings/docx-parser |
max_upload_size | |
id | 1246845 |
size | 497,239 |
This package uses the docx-rs crate to parse docx files. It subsequently converts the parsed docx file into Markdown format. Alternatively, it can also be used to convert docx files into JSON format, where only the structure relevant for creating Markdown documents is kept.
It can be used as a library, or you can install it and use it from the command line.
$ git clone https://github.com/erikvullings/docx-parser.git
$ cargo install --path .
$ docx-parser -h
Processes a DOCX file and outputs as Markdown or JSON
Usage: docx-parser [OPTIONS] <FILE>
Arguments:
<FILE> The input DOCX file
Options:
-o, --output <OUTPUT> Sets the output destination. Default is console
-f, --format <FORMAT> Sets the output format. Default is markdown. Options: md, json, pretty_json
-h, --help Print help
-V, --version Print version
# Example
$ docx-parser ./test/tables.docx -f pretty_json
use docx_parser::MarkdownDocument;
let markdown_doc = MarkdownDocument::from_file("./test/tables.docx");
let markdown = markdown_doc.to_markdown(true);
let json = markdown_doc.to_json(true);
println!("\n\n{}", markdown);
println!("\n\n{}", json);
cargo update
cargo test
cargo build --release