xml-xls-parser

Crates.ioxml-xls-parser
lib.rsxml-xls-parser
version0.1.0
sourcesrc
created_at2020-04-13 15:49:49.996347
updated_at2020-04-13 15:49:49.996347
descriptionParse XLS files as XML
homepage
repositoryhttps://github.com/blakehawkins/xml-xls-parser
max_upload_size
id229735
size3,722,260
(blakehawkins)

documentation

README

I encountered some XLS files that fail to be parsed by a number of tools (xlrd, pandas, openpyxl, calamine).

The files appear to be in XML format with the following properties:

  • Workbook
  • Worksheet
  • Table
  • Row
  • Cell
  • Data
  • Styles
  • Style
  • NumberFormat
  • Font
  • Alignment

It is unclear what makes the files unreadable by XLS and XLSX parsers.

This project reads XLS consisting only of the above properties (XML formatted document) and emits a best-effort TSV.

$ cp /path/to/file.xls input.xls
$ cargo run > out.tsv
$ less -S out.tsv

How?

It's just a serde specification, using serde-xml-rs.

Expect to modify the code if your source document contains anything other than the properties defined above.

Commit count: 3

cargo fmt