Crates.io | biodata-parsers |
lib.rs | biodata-parsers |
version | 0.1.0 |
source | src |
created_at | 2017-01-15 01:56:35.198014 |
updated_at | 2017-01-15 01:56:35.198014 |
description | Scripts for parsing UniParc XML files downloaded from the Uniprot website into CSV files. |
homepage | https://ostrokach.github.io/uniparc_xml_parser |
repository | https://github.com/ostrokach/uniparc_xml_parser |
max_upload_size | |
id | 8074 |
size | 45,016 |
Process the UniParc XML file (uniparc_all.xml.gz
) downloaded from the UniProt website into CSV files that can be loaded into a relational database.
Parsing 1 million lines takes about 5.5 seconds:
$ mkdir uniparc
$ time bash -c "zcat tests/uniparc_1mil.xml.gz | uniparc_xml_parser >/dev/null"
real 0m5.564s
user 0m5.528s
sys 0m0.132s
The actual uniparc_all.xml.gz
file is about 5 billion rows.