| Crates.io | hedl-json |
| lib.rs | hedl-json |
| version | 1.2.0 |
| created_at | 2026-01-08 11:14:00.616737+00 |
| updated_at | 2026-01-21 02:59:52.83278+00 |
| description | HEDL to/from JSON conversion |
| homepage | https://dweve.com |
| repository | https://github.com/dweve/hedl |
| max_upload_size | |
| id | 2030003 |
| size | 636,184 |
HEDL's integration with the JSON ecosystem—bidirectional conversion, JSONPath queries, schema generation, and streaming.
JSON is the universal data interchange format. Your APIs speak it, your databases accept it, your monitoring tools consume it, your LLM providers require it. Every token in a JSON payload costs money. Every extra byte adds latency. Every API call compounds the inefficiency.
hedl-json bridges HEDL's efficiency with JSON's ubiquity. Use HEDL's compact matrix notation internally—save 46.7% on tokens, 57.7% on payload size. When you need JSON compatibility, hedl-json handles the conversion seamlessly. Query HEDL documents with JSONPath. Generate JSON Schema for validation. Stream large JSON files without loading everything into memory.
Part of the HEDL format family alongside hedl-yaml, hedl-xml, hedl-csv, and hedl-parquet—bringing HEDL's efficiency to every ecosystem you work in.
Based on 6,333 lines of Rust across 7 modules:
[dependencies]
hedl-json = "1.2"
Convert HEDL's compact representation to JSON when you need API compatibility:
use hedl_json::{to_json, to_json_value, ToJsonConfig};
let doc = hedl_core::parse(br#"
%STRUCT: User: [id, name, email]
---
users: @User
| alice, Alice Smith, alice@example.com
| bob, Bob Jones, bob@example.com
"#)?;
// Configure JSON output
let config = ToJsonConfig {
include_metadata: false, // Don't add __type__, __schema__ fields
flatten_lists: false, // Keep matrix structure as object arrays
include_children: true, // Include nested entities
ascii_safe: false, // UTF-8 output (set true for ASCII-only)
};
// Convert to JSON string (for API responses)
let json_str = to_json(&doc, &config)?;
// {"users": [{"id": "alice", "name": "Alice Smith", "email": "alice@example.com"}, ...]}
// Or get serde_json::Value directly (for further processing)
let json_val = to_json_value(&doc, &config)?;
Token Efficiency: HEDL's matrix notation saves 46.7% tokens compared to verbose JSON arrays. Use HEDL internally, export to JSON only at system boundaries.
Parse JSON from external APIs into HEDL's structured data model:
use hedl_json::{from_json, from_json_value, from_json_value_owned, FromJsonConfig};
// From JSON string (e.g., API response)
let json = r#"{"name": "Alice", "age": 30, "active": true}"#;
let config = FromJsonConfig::default();
let doc = from_json(json, &config)?;
// From serde_json::Value (existing parsed JSON)
let value: serde_json::Value = serde_json::from_str(json)?;
// Borrows the value (value remains usable after conversion)
let doc = from_json_value(&value, &config)?;
// Or takes ownership for zero-copy efficiency
let doc = from_json_value_owned(value, &config)?;
FromJsonConfig enforces resource limits to prevent denial-of-service attacks from malicious JSON. Defaults are intentionally high for legitimate ML and data processing workloads:
use hedl_json::{from_json, FromJsonConfig};
// Default configuration (for trusted internal data)
let default = FromJsonConfig::default();
// max_depth: Some(10,000) levels (deep hierarchies, nested JSON)
// max_array_size: Some(10,000,000) elements (large datasets, batch processing)
// max_string_length: Some(100 MB) (embeddings, base64-encoded data)
// max_object_size: Some(100,000) keys (rich metadata, complex objects)
let json = r#"{"name": "Alice", "age": 30}"#;
let doc = from_json(json, &default)?;
For untrusted input (user uploads, external APIs, public endpoints), use stricter limits:
use hedl_json::{from_json, FromJsonConfig};
// Strict configuration (for untrusted external sources)
let strict = FromJsonConfig::builder()
.max_depth(100) // 100 levels
.max_array_size(10_000) // 10K elements
.max_string_length(1_000_000) // 1 MB
.max_object_size(1_000) // 1K keys
.build();
let json = r#"{"name": "Bob", "age": 25}"#;
let doc = from_json(json, &strict)?;
Exceeding limits returns JsonConversionError variants: MaxDepthExceeded, MaxArraySizeExceeded, MaxStringLengthExceeded, MaxObjectSizeExceeded.
When converting JSON arrays with repeated structure (common in API responses), hedl-json caches inferred schemas automatically:
use hedl_json::schema_cache::{SchemaCache, SchemaCacheKey};
let cache = SchemaCache::new(100); // Capacity: 100 schemas
// Cache is used automatically during from_json() for uniform arrays
// Manual cache usage (for advanced control):
let key = SchemaCacheKey::new(vec!["id".to_string(), "name".to_string()]);
cache.insert(key.clone(), vec!["id".to_string(), "name".to_string()]);
if let Some(schema) = cache.get(&key) {
// Hit: 30-50% faster than re-inferring schema
}
// Monitor cache performance
let stats = cache.statistics();
println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0);
println!("Hits: {}, Misses: {}, Evictions: {}",
stats.hits, stats.misses, stats.evictions);
For 1000-row JSON arrays with repeated structure, schema caching provides 30-50% speedup over naive inference.
Query HEDL documents using standard JSONPath syntax (powered by serde_json_path):
use hedl_json::jsonpath::{query, query_first, query_single, query_exists, query_count, QueryConfig};
let doc = hedl_core::parse(br#"
users: @User[id, name, age]
| alice, Alice Smith, 30
| bob, Bob Jones, 25
| carol, Carol White, 35
"#)?;
let config = QueryConfig::default();
// Get all matches
let results = query(&doc, "$.users[?(@.age > 30)].name", &config)?;
// Returns: [serde_json::Value("Carol White")]
// Get first match (returns Option)
let first = query_first(&doc, "$.users[0].name", &config)?;
// Returns: Some(serde_json::Value("Alice Smith"))
// Get exactly one match (errors if 0 or multiple matches)
let single = query_single(&doc, "$.users[?(@.id == 'alice')].name", &config)?;
// Returns: serde_json::Value("Alice Smith")
// Check if any matches exist
let exists = query_exists(&doc, "$.users[?(@.age > 40)]", &config)?;
// Returns: false
// Count matches
let count = query_count(&doc, "$.users[*]", &config)?;
// Returns: 3
use hedl_json::jsonpath::{QueryConfig, QueryConfigBuilder};
let config = QueryConfig {
include_metadata: false, // Don't add __type__ fields in results
flatten_lists: false, // Keep matrix structure
include_children: true, // Include nested data
max_results: 100, // Limit results (0 = unlimited)
};
// Or use builder
let config = QueryConfigBuilder::new()
.include_metadata(false)
.max_results(50)
.build();
Generate JSON Schema Draft 7 from HEDL documents for validation and documentation:
use hedl_json::schema_gen::{generate_schema, generate_schema_value, SchemaConfig};
let doc = hedl_core::parse(br#"
%STRUCT: User: [id, name, email, age]
---
users: @User
| u1, Alice, alice@example.com, 30
"#)?;
let config = SchemaConfig::builder()
.title("User API Schema")
.description("Schema for user data endpoint")
.schema_id("https://api.example.com/schemas/user.json")
.strict(true) // disallow additionalProperties
.include_examples(true) // add example values from data
.include_metadata(true) // include title/description/$id
.build();
// Generate as JSON string (for documentation)
let schema_json = generate_schema(&doc, &config)?;
// Or as serde_json::Value (for programmatic use)
let schema_value = generate_schema_value(&doc, &config)?;
The schema generator infers JSON Schema formats from actual data:
Value-Based Inference (analyzed during schema generation):
// Field values → JSON Schema format annotation
"alice@example.com" → {"type": "string", "format": "email"}
"https://example.com" → {"type": "string", "format": "uri"}
"2024-01-15T10:30:00Z" → {"type": "string", "format": "date-time"}
"550e8400-e29b-41d4-a716-..." → {"type": "string", "format": "uuid"}
Name-Based Inference (fallback when values are ambiguous):
// Field names → format hints
"email" field → format: "email"
"url" field → format: "uri"
"created_at" field → format: "date-time"
"uuid" field → format: "uuid"
HEDL's %NEST declarations become nested object arrays in JSON Schema:
let doc = hedl_core::parse(br#"
%STRUCT: Team: [id, name]
%STRUCT: Member: [id, name, role]
%NEST: Team > Member
---
teams: @Team
| t1, Engineering
"#)?;
let schema = generate_schema_value(&doc, &SchemaConfig::default())?;
// Team schema includes:
// {
// "type": "object",
// "properties": {
// "id": {"type": "string"},
// "name": {"type": "string"},
// "members": {
// "type": "array",
// "items": {"$ref": "#/definitions/Member"}
// }
// }
// }
Stream elements from large JSON arrays incrementally:
use hedl_json::streaming::{JsonArrayStreamer, StreamConfig};
use std::fs::File;
// Open large JSON file: [{...}, {...}, {...}, ...]
let file = File::open("large_dataset.json")?;
let config = StreamConfig::default();
let streamer = JsonArrayStreamer::new(file, config)?;
let mut count = 0;
for result in streamer {
let doc = result?; // Each array element as HEDL document
count += 1;
// Process document: validate, transform, aggregate
}
println!("Processed {} documents", count);
Performance: Streaming is 1.2-2.1x faster than loading the full array and parsing.
Stream JSONL files line-by-line with robust error handling:
use hedl_json::streaming::{JsonLinesStreamer, StreamConfig};
use std::fs::File;
let file = File::open("logs.jsonl")?; // One JSON object per line
let config = StreamConfig::default();
let streamer = JsonLinesStreamer::new(file, config);
for result in streamer {
match result {
Ok(doc) => {
// Process valid log entry
}
Err(e) => {
// Malformed line—log error and continue
eprintln!("Skipping malformed line {}: {}",
streamer.line_number(), e);
}
}
}
JSONL Features:
# are ignoredline_number() method for debuggingWrite HEDL documents as JSONL for streaming output:
use hedl_json::streaming::JsonLinesWriter;
use std::fs::File;
let file = File::create("output.jsonl")?;
let mut writer = JsonLinesWriter::new(file);
for doc in documents {
writer.write_document(&doc)?; // One document per line
}
writer.flush()?; // Ensure all data written
use hedl_json::streaming::StreamConfig;
use hedl_json::FromJsonConfig;
let config = StreamConfig {
buffer_size: 64 * 1024, // 64 KB buffer (default)
max_object_bytes: Some(10 * 1024 * 1024), // 10 MB per object (default)
from_json: FromJsonConfig::default(), // Security limits per object
use_size_estimation: true, // Efficient size checks (default)
true_streaming: true, // Constant memory for arrays (default)
};
// Or use builder
let config = StreamConfig::builder()
.buffer_size(128 * 1024) // 128 KB buffer
.max_object_bytes(50 * 1024 * 1024) // 50 MB per object
.unlimited_object_size() // Disable limit (use with caution)
.from_json_config(FromJsonConfig::builder()
.max_depth(100)
.build())
.use_size_estimation(true) // Efficient size checks
.true_streaming(true) // Constant memory mode
.build();
| HEDL Type | JSON Output | Example |
|---|---|---|
| Scalars (null, bool, number, string) | Direct mapping | null, true, 42, "text" |
| Objects | JSON objects | {"key": "value"} |
| Arrays (tensors) | JSON arrays | [1, 2, 3] |
@User:alice (reference) |
{"@ref": "@User:alice"} |
Special object format |
$(x + 1) (expression) |
"$(x + 1)" |
String with $() wrapper |
| Matrix lists | Arrays of objects | [{"id": "a", "name": "Alice"}, ...] |
Example matrix list conversion:
users: @User[id, name]
| alice, Alice
| bob, Bob
Becomes:
{
"users": [
{"id": "alice", "name": "Alice"},
{"id": "bob", "name": "Bob"}
]
}
| JSON Type | HEDL Result | Notes |
|---|---|---|
| Objects | HEDL objects | Nested structures preserved |
| Arrays | HEDL arrays | Uniform objects become matrix lists |
{"@ref": "..."} |
HEDL reference | Special format recognized |
"$(...)" strings |
HEDL expression | Pattern triggers expression parsing |
| Primitives | Direct mapping | Null, bool, number, string |
Schema Inference: Uniform object arrays are automatically converted to matrix lists with inferred schemas. Fields are sorted alphabetically with id first if present.
API Integration: Receive JSON from external APIs, convert to HEDL for structured processing, export back to JSON for responses. Save 46.7% on token costs for LLM API calls.
Data Pipelines: Read JSON logs/events, process with HEDL's structured model, export to CSV (hedl-csv) or Parquet (hedl-parquet) for analytics.
Configuration Management: Store configs in HEDL with schema validation (hedl-lint), export to JSON for runtime consumption by existing tools.
LLM Context Optimization: Convert verbose JSON prompts to HEDL (46.7% token savings), send compact HEDL to LLM provider's API (after JSON conversion at the boundary).
Schema Documentation: Generate JSON Schema from HEDL documents for API documentation, OpenAPI specs, and validation tools.
Log Processing: Stream large JSONL log files, filter/transform with HEDL's query API, aggregate statistics without full memory load.
Schema Preservation: JSON has no schema concept. HEDL's %STRUCT, %NEST, %ALIAS declarations are lost in JSON conversion. If you need validation after round-tripping through JSON, redefine schemas explicitly in HEDL.
Validation: Converts formats faithfully, doesn't validate data against schemas. For schema validation, use hedl-lint.
Optimization: Converts structures as-is, not optimally. Verbose JSON becomes verbose HEDL. To leverage HEDL's matrix efficiency, restructure data into uniform arrays intentionally.
True Array Streaming: JsonArrayStreamer loads the entire JSON array into memory (limitation of serde_json). For true incremental processing, use JsonLinesStreamer with JSONL format.
serde_json 1.0 - JSON parsing and serializationserde_json_path 0.7 - JSONPath query enginehedl-core 1.0 - HEDL parsing and data modelthiserror 1.0 - Error type definitionsConversion: HEDL → JSON is serialization-bound. JSON → HEDL is parsing-bound.
Caching: Schema inference with caching provides 30-50% speedup for repeated structures in JSON arrays.
Streaming:
JSONPath: Query performance depends on serde_json_path implementation. Queries execute on JSON representation (HEDL → JSON conversion happens first).
Detailed performance benchmarks are available in the HEDL repository benchmark suite.
Apache-2.0