docx-parser

Crates.iodocx-parser
lib.rsdocx-parser
version0.1.1
sourcesrc
created_at2024-05-21 13:33:09.801552
updated_at2024-05-21 13:33:09.801552
descriptionParse Word and OpenOffice DOCX files, and output markdown or JSON
homepage
repositoryhttps://github.com/erikvullings/docx-parser
max_upload_size
id1246845
size497,239
Erik Vullings (erikvullings)

documentation

README

DOXC-PARSER

This package uses the docx-rs crate to parse docx files. It subsequently converts the parsed docx file into Markdown format. Alternatively, it can also be used to convert docx files into JSON format, where only the structure relevant for creating Markdown documents is kept.

It can be used as a library, or you can install it and use it from the command line.

CLI application

$ git clone https://github.com/erikvullings/docx-parser.git
$ cargo install --path .
$ docx-parser -h

Processes a DOCX file and outputs as Markdown or JSON

Usage: docx-parser [OPTIONS] <FILE>

Arguments:
  <FILE>  The input DOCX file

Options:
  -o, --output <OUTPUT>  Sets the output destination. Default is console
  -f, --format <FORMAT>  Sets the output format. Default is markdown. Options: md, json, pretty_json
  -h, --help             Print help
  -V, --version          Print version

# Example
$ docx-parser ./test/tables.docx -f pretty_json

Library

use docx_parser::MarkdownDocument;

let markdown_doc = MarkdownDocument::from_file("./test/tables.docx");
let markdown = markdown_doc.to_markdown(true);
let json = markdown_doc.to_json(true);

println!("\n\n{}", markdown);
println!("\n\n{}", json);

Development commands

cargo update
cargo test
cargo build --release
Commit count: 21

cargo fmt