unnest-ndjson

Crates.io	unnest-ndjson
lib.rs	unnest-ndjson
version
source	src
created_at	2022-09-19 18:45:40.578453+00
updated_at	2025-01-03 20:54:04.931929+00
description	Convert large JSON documents to ndjson/jsonlines
homepage
repository	https://github.com/FauxFaux/unnest-ndjson
max_upload_size
id	669308
Cargo.toml error:	TOML parse error at line 18, column 1 \| 18 \| autolib = false \| ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
size	0

Chris West (FauxFaux)

documentation

README

unnest-ndjson

This tool can unpack JSON objects into ndjson, also called jsonlines.

ndjson is much easier to consume than JSON objects in some situations.

Usage

TARGET_DEPTH: how many levels of document to strip away
--path: include the path to the element, as the key

Examples

Say you have a JSON document that looks like:

[
  {"name": "john", "class": "warrior"},
  {"name": "sam", "class": "wizard"},
  {"name": "alex", "class": "terrible"}
]

You could produce:

<array.json unnest-ndjson 1

{"name":"john","class":"warrior"}
{"name":"sam","class":"wizard"}
{"name":"alex","class":"terrible"}

That is, removing the outer array wrapper.

Or, with --path, it can produce:

{"key":[0],"value":{"name":"john","class":"warrior"}}
{"key":[1],"value":{"name":"sam","class":"wizard"}}
{"key":[2],"value":{"name":"alex","class":"terrible"}}

Or, with --path 2, it can produce:

{"key":[0,"name"],"value":"john"}
{"key":[0,"class"],"value":"warrior"}
{"key":[1,"name"],"value":"sam"}
{"key":[1,"class"],"value":"wizard"}
{"key":[2,"name"],"value":"alex"}
{"key":[2,"class"],"value":"terrible"}

A similar thing works for non-array documents, like:

{
  "john": {"class": "warrior"},
  "sam": {"class": "wizard"},
  "alex": {"class": "terrible"}
}

You might want:

{"key":["john"],"value":{"class":"warrior"}}
{"key":["sam"],"value":{"class":"wizard"}}
{"key":["alex"],"value":{"class":"terrible"}}

Why?

It's quite fast, and uses very little memory. This could be useful if you wanted to process the resulting, much smaller, JSON documents with another application.

On 2019 hardware, it can process JSON at about 1GB/s, and needs approximately no memory to do so.

A 300MB file can be converted in 300ms and 3MB of RAM, regardless of settings. For comparison, jq takes over 2 seconds, and 350MB of RAM, to read the same file, even if it is not printing any of it.

How?

It's a custom JSON "parser" (scanner? bracket matcher?), which doesn't try and actually load the JSON into memory, or decode any of the idiosyncrasies.

License

MIT OR Apache-2.0

Commit count: 40

unnest-ndjson

documentation

README

unnest-ndjson

Usage

Examples

Why?

How?

License

cargo fmt