# Read Apache ORC from Rust [![test](https://github.com/DataEngineeringLabs/orc-format/actions/workflows/test.yml/badge.svg)](https://github.com/DataEngineeringLabs/orc-format/actions/workflows/test.yml) [![codecov](https://codecov.io/gh/DataEngineeringLabs/orc-format/branch/main/graph/badge.svg?token=AgyTF60R3D)](https://codecov.io/gh/DataEngineeringLabs/orc-format) Read [Apache ORC](https://orc.apache.org/) in Rust. This repository is similar to [parquet2](https://github.com/jorgecarleitao/parquet2) and [Avro-schema](https://github.com/DataEngineeringLabs/avro-schema), providing a toolkit to: * Read ORC files (proto structures) * Read stripes (the conversion from proto metadata to memory regions) * Decode stripes (the math of decode stripes into e.g. booleans, runs of RLE, etc.) It currently reads the following (logical) types: * booleans * strings * integers * floats What is not yet implemented: * Snappy, LZO decompression * RLE v2 `Patched Base` decoding * RLE v1 decoding * Utility functions to decode non-native logical types: * decimal * timestamp * struct * List * Union ## Run tests ```bash python3 -m venv venv venv/bin/pip install -U pip venv/bin/pip install -U pyorc venv/bin/python write.py cargo test ```