Crates.io | headj |
lib.rs | headj |
version | 0.1.1 |
source | src |
created_at | 2022-09-28 02:56:36.41909 |
updated_at | 2022-09-30 02:01:43.508966 |
description | A utility that converts input JSON arrays into valid JSON that contains only a subset of the elements |
homepage | |
repository | https://github.com/evanjpw/headj-rs |
max_upload_size | |
id | 675370 |
size | 41,351 |
A utility that converts input JSON arrays into valid JSON that contains only a subset of the elements
A utility to take enormous JSON files & cut them down to size.
Sometimes one has a JSON file with a very, very large array in it, but you really would like to have a subset of the data. For example, you have a JSON file representing a DB dump of millions of records & you'd like to have a workable number of rows to test with, or code with, or just examine.
You could use an editor, but even if the editor can load a file that large, it's likely to be very unpleasant to use.
You could just use some kind of text processor (like the unix/linux head
command)
but there are two issues with that:
You could write a script/program to do the work for you. I did that. Then I decided to package it up, so you don't have to.
headj
is a command line utility, similar to the head
command, for producing a subset of a JSON file that is
itself valid JSON. It allows you to ingest JSON containing a huge JSON array I produce JSON with a manageable JSON
array.
It appears to work, but I'm working on it.
One very large caveat is: It freely discards JSON that surrounds the array of interest. So, if you have
a complex JSON object with a huge array in it, you will get a JSON file back that only includes the
(reduced) array & whatever JSON structure it was found in. Everything else will be elided. (This actually works correctly now)
For example:
Input
{
"a": 1,
"b": [
1,
2,
3,
4,
5
],
"c": true
}
command: headj --key 'b' --count 3
output
{
"b": [
1,
2,
3
]
}
Cargo
cargo install headj
USAGE:
headj [OPTIONS] [INPUT_FILE]
ARGS:
<INPUT_FILE> The JSON file to read from. If none is specified, reads from Standard Input
OPTIONS:
-c, --count <COUNT> Number of elements to copy to the output (default: 100) [default:
100]
-d, --debug Activate extra debugging output
-f, --format-output Nicely format the output JSON with indentation & newlines
-h, --help Print help information
-k, --key <KEY> The JSON key of the array to copy from. If none specified, treat
the input JSON as an array
-n, --no-context Output _only_ the target JSON array
-o, --out-file <OUT_FILE> File to write the JSON results to (default: Standard Output)
-q, --quiet Don't print any status, diagnostic or error messages
-s, --skip <SKIP> Number of elements to skip before copying (default: 0) [default: 0]
-V, --version Print version information
headj <<- JSON
[1,2,3,4,5]
JSON
# Output: [1, 2, 3, 4, 5]
headj -c 1 <<- JSON
[1,2,3,4,5]
JSON
# Output: [1]
headj -c 1 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3]
headj -c 2 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4]
headj -k 'foo' <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: {"foo": [1, 2, 3, 4, 5]}
headj -k 'foo' -n <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: [1, 2, 3, 4, 5]
headj -c 2 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4]
headj -c 25 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4, 5]
headj -c 2 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4]
headj -c 2 -s 2 -f <<- JSON
[1,2,3,4,5]
JSON
# Output: [\n 3,\n 4\n]
headj -k 'foo.bar' -c 2 -s 2 -n <<- JSON
{"foo":{"bar":[1,2,3,4,5]}}
JSON
# Output: [3, 4]
headj -k 'foo.bar' -c 2 -s 2 <<- JSON
{"foo":{
"bar":[1,2,3,4,5]}
}
JSON
# Output: {"bar": {"foo": [3, 4]}}
headj -k 'foo' -c 2 -s 2 <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: {"foo": [3, 4]}
headj -k 'foo' -c 2 -s 2 -n <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: [3, 4]
--key
is based, in the most vague sense, on the JSON Schema specification.
It's more of a gesture in the general direction of JSON Schema. The reason that it doesn't use the full
JSON Schema is, the most natural way to write keys if you don't feel like reading a specification would
not necessarily start at the root, which would be potentially confusing. Insisting that users begin
with a '$
' would likely seem arbitrary & annoying. So, the "just dots & backslashes" implementation seemed
reasonable.--format
option currently DOES NOTHING. Working on it.MIT