biodata-parsers

Crates.iobiodata-parsers
lib.rsbiodata-parsers
version0.1.0
sourcesrc
created_at2017-01-15 01:56:35.198014
updated_at2017-01-15 01:56:35.198014
descriptionScripts for parsing UniParc XML files downloaded from the Uniprot website into CSV files.
homepagehttps://ostrokach.github.io/uniparc_xml_parser
repositoryhttps://github.com/ostrokach/uniparc_xml_parser
max_upload_size
id8074
size45,016
Alexey Strokach (ostrokach)

documentation

https://ostrokach.github.io/uniparc_xml_parser

README

UniParc XML parser

Process the UniParc XML file (uniparc_all.xml.gz) downloaded from the UniProt website into CSV files that can be loaded into a relational database.

Example

Parsing 1 million lines takes about 5.5 seconds:

$ mkdir uniparc
$ time bash -c "zcat tests/uniparc_1mil.xml.gz | uniparc_xml_parser >/dev/null"

real    0m5.564s
user    0m5.528s
sys     0m0.132s

The actual uniparc_all.xml.gz file is about 5 billion rows.

Commit count: 159

cargo fmt