parquet2json

Crates.io	parquet2json
lib.rs	parquet2json
version	4.3.0
created_at	2021-07-16 16:59:31.218284+00
updated_at	2025-11-03 17:02:54.773056+00
description	A command-line tool for streaming Parquet as line-delimited JSON
homepage
repository	https://github.com/jupiter/parquet2json
max_upload_size
id	423719
size	99,293

Pieter Raubenheimer (jupiter)

documentation

README

parquet2json

A command-line tool for streaming Parquet as line-delimited JSON.

It reads only required ranges from file, HTTP or S3 locations, and supports offset/limit and column selection.

It uses the Apache Parquet Official Native Rust Implementation which has excellent support for compression formats and complex types.

How to use

Install from crates.io and execute from the command line, e.g.:

$ cargo install parquet2json
$ parquet2json --help

Usage: parquet2json <FILE> <COMMAND>

Commands:
  cat       Outputs data as JSON lines
  schema    Outputs the Thrift schema
  rowcount  Outputs only the total row count
  help      Print this message or the help of the given subcommand(s)

Arguments:
  <FILE>  Location of Parquet input file (file path, HTTP or S3 URL)

Options:
  -h, --help     Print help
  -V, --version  Print version

$ parquet2json cat --help

Usage: parquet2json <FILE> cat [OPTIONS]

Options:
  -o, --offset <OFFSET>    Starts outputting from this row (first row: 0, last row: -1) [default: 0]
  -l, --limit <LIMIT>      Maximum number of rows to output
  -c, --columns <COLUMNS>  Select columns by name (comma,separated,?prefixed_optional)
  -n, --nulls              Outputs null values
  -h, --help               Print help

S3 Settings

Credentials are provided as per standard AWS toolchain, i.e. per environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), AWS credentials file or IAM ECS container/instance profile.

The default AWS region must be set per environment variable (AWS_DEFAULT_REGION) in AWS credentials file and must match region of the object's bucket.

Examples

Use it to stream output to files and other tools such as grep and jq.

Output to a file

$ parquet2json ./myfile.parquet cat > output.jsonl

From S3 or HTTP (S3)

$ parquet2json s3://overturemaps-us-west-2/release/2024-03-12-alpha.0/theme=base/type=land/part-00001-10ae8a61-702e-480f-9024-6dee4abd93df-c000.zstd.parquet cat

$ parquet2json https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/release/2024-03-12-alpha.0/theme%3Dbase/type%3Dland/part-00001-10ae8a61-702e-480f-9024-6dee4abd93df-c000.zstd.parquet cat

Filter selected columns with jq

$ parquet2json ./myfile.pq cat --columns=url,level | jq 'select(.level==3) | .url'

License

MIT

Commit count: 42

parquet2json

documentation

README

parquet2json

How to use

S3 Settings

Examples

Output to a file

From S3 or HTTP (S3)

Filter selected columns with jq

License

cargo fmt