Crates.io | genson-cli |
lib.rs | genson-cli |
version | 0.4.2 |
created_at | 2025-08-20 15:12:35.045149+00 |
updated_at | 2025-09-25 21:54:23.320873+00 |
description | Command-line interface for JSON schema inference |
homepage | https://github.com/lmmx/polars-genson |
repository | https://github.com/lmmx/polars-genson |
max_upload_size | |
id | 1803541 |
size | 88,921 |
A command-line tool for JSON schema inference with support for both regular and NDJSON.
Built on top of genson-core, this CLI tool provides a simple yet powerful interface for generating JSON schemas from JSON data files or standard input.
It was mainly for testing but might be useful in its own right as a standalone binary for schema inference.
cargo binstall genson-cli
or regular cargo install
if you like building from source.
# From a JSON file
genson-cli data.json
# From standard input
echo '{"name": "Alice", "age": 30}' | genson-cli
# From stdin with multiple JSON objects
cat multiple-objects.json | genson-cli
# Process newline-delimited JSON
genson-cli --ndjson data.jsonl
# From stdin
cat events.ndjson | genson-cli --ndjson
# Treat top-level arrays as object streams (default)
genson-cli data.json
# Preserve array structure
genson-cli --no-ignore-array array-data.json
genson-cli - JSON schema inference tool
USAGE:
genson-cli [OPTIONS] [FILE]
ARGS:
<FILE> Input JSON file (reads from stdin if not provided)
OPTIONS:
-h, --help Print this help message
--no-ignore-array Don't treat top-level arrays as object streams
--ndjson Treat input as newline-delimited JSON
--avro Output Avro schema instead of JSON Schema
--normalise Normalise the input data against the inferred schema
--coerce-strings Coerce numeric/boolean strings to schema type during normalisation
--keep-empty Keep empty arrays/maps instead of turning them into nulls
--map-threshold <N> Treat objects with >N keys as map candidates (default 20)
--force-type k:v,... Force field(s) to 'map' or 'record'
Example: --force-type labels:map,claims:record
--map-encoding <mode> Choose map encoding (mapping|entries|kv)
mapping = Avro/JSON object (shared dict)
entries = list of single-entry objects (individual dicts)
kv = list of {key,value} objects
--wrap-root <field> Wrap top-level schema under this required field
EXAMPLES:
genson-cli data.json
echo '{"name": "test"}' | genson-cli
genson-cli --ndjson multi-line.jsonl
Normalisation rewrites raw JSON data so that every record conforms to a single inferred Avro schema. This is especially useful when input data is jagged, inconsistent, or comes from semi-structured sources.
Features:
null
(default), or preserves them with --keep-empty
.null
values.["null", "string"]
where values may be either).--coerce-strings
).Input:
{"name": "Alice", "age": 30, "active": true}
Command:
echo '{"name": "Alice", "age": 30, "active": true}' | genson-cli
Output:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"active": {
"type": "boolean"
}
},
"required": [
"age",
"active",
"name"
]
}
echo '{"name": "Alice", "age": 30, "active": true}' | genson-cli --avro
Output:
{
"type": "record",
"name": "document",
"namespace": "genson",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
},
{
"name": "active",
"type": "boolean"
}
]
}
Input file (users.json
):
{"name": "Alice", "age": 30, "scores": [95, 87]}
{"name": "Bob", "age": 25, "city": "NYC", "active": true}
{"name": "Charlie", "age": 35, "metadata": {"role": "admin"}}
Command:
genson-cli users.json
Output:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"scores": {
"type": "array",
"items": {
"type": "integer"
}
},
"city": {
"type": "string"
},
"active": {
"type": "boolean"
},
"metadata": {
"type": "object",
"properties": {
"role": {
"type": "string"
}
},
"required": ["role"]
}
},
"required": ["age", "name"]
}
Input file (events.ndjson
):
{"event": "login", "user": "alice", "timestamp": "2024-01-01T10:00:00Z"}
{"event": "logout", "user": "alice", "timestamp": "2024-01-01T11:00:00Z", "duration": 3600}
{"event": "login", "user": "bob", "timestamp": "2024-01-01T10:30:00Z", "ip": "192.168.1.100"}
Command:
genson-cli --ndjson events.ndjson
Output:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"event": {
"type": "string"
},
"user": {
"type": "string"
},
"timestamp": {
"type": "string"
},
"duration": {
"type": "integer"
},
"ip": {
"type": "string"
}
},
"required": ["event", "timestamp", "user"]
}
Input file (array.json
):
[
{"id": 1, "name": "Product A"},
{"id": 2, "name": "Product B", "category": "electronics"}
]
Command (treat as object stream - default):
genson-cli array.json
Output:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"category": {
"type": "string"
}
},
"required": ["id", "name"]
}
Command (preserve array structure):
genson-cli --no-ignore-array array.json
Output:
{
"$schema": "http://json-schema.org/schema#",
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"category": {
"type": "string"
}
},
"required": ["id", "name"]
}
}
Input (empty.json
):
{"id": "Q1", "labels": {}}
{"id": "Q2", "labels": {"en": "Hello"}}
Command:
genson-cli --ndjson --normalise empty.json
Output:
{"id": "Q1", "labels": null}
{"id": "Q2", "labels": {"en": "Hello"}}
Input (stringy.json
):
{"id": "42", "active": "true"}
{"id": 7, "active": false}
Command (default):
genson-cli --ndjson --normalise stringy.json
Output (no coercion, strings remain strings):
{"id": null, "active": null}
{"id": 7, "active": false}
Command (with coercion):
genson-cli --ndjson --normalise --coerce-strings data.json
Output:
{"id": 42, "active": true}
{"id": 7, "active": false}
The CLI provides clear error messages for common issues:
$ echo '{"invalid": json}' | genson-cli
Error: Invalid JSON input at index 1: expected value at line 1 column 13 - JSON: {"invalid": json}
$ genson-cli nonexistent.json
Error: No such file or directory (os error 2)
$ echo '' | genson-cli
Error: No JSON strings provided
For a 100MB NDJSON file with 1M records:
The CLI tool is part of the larger polars-genson ecosystem:
# Extract schema from API responses
curl https://api.example.com/users | genson-cli > users-schema.json
# Process log files
genson-cli --ndjson application.log > log-schema.json
# Validate data structure
cat data.json | genson-cli | jq '.properties | keys'
# Generate schema for documentation
genson-cli sample-data.json > api-schema.json
# Validate API responses match expected schema
# (combine with tools like ajv-cli for validation)
# Understand structure of legacy data
genson-cli legacy-export.json > legacy-schema.json
# Compare schemas between different data sources
diff <(genson-cli source1.json) <(genson-cli source2.json)
For very large JSON files, consider using streaming tools:
# Process large file in chunks
split -l 10000 large-file.ndjson chunk_
for chunk in chunk_*; do
genson-cli --ndjson "$chunk" > "schema_${chunk}.json"
done
# Merge resulting schemas (requires additional tooling)
The tool supports different schema versions:
# Default: http://json-schema.org/schema#
genson-cli data.json
# The schema URI is automatically included in output
This crate is part of the polars-genson project. See the main repository for the contribution and development docs.
Licensed under the MIT License. See LICENSE](https://github.com/lmmx/polars-genson/blob/master/LICENSE) for details.