large-json-array

Crates.iolarge-json-array
lib.rslarge-json-array
version0.1.1
created_at2025-06-06 10:39:19.524059+00
updated_at2025-06-08 02:30:44.8282+00
descriptionA streaming JSONโ€array parser that yields one object at a time
homepage
repositoryhttps://github.com/bambolelooo/large-json-array
max_upload_size
id1702800
size31,528
ilya (bambolelooo)

documentation

README

large-json-array

Stream large JSON arrays in Rust using an efficient iterator-based approach.

๐Ÿš€ Overview

Serde's default behavior loads entire JSON documents into memory. When dealing with very large JSON arrays, this becomes inefficient or outright impossible. This crate provides a custom JsonStream<R: Read> iterator that streams and parses objects from a JSON array one-by-one without allocating the entire array in memory.

๐Ÿ”ง Features

  • Memory-efficient streaming of large JSON arrays
  • Works with any Read source (e.g., files, network streams)
  • Integrates with serde_json::Value
  • Robust bracket tracking and string-state handling

๐Ÿ›  Example

use std::fs::File;
use std::io::BufReader;
use large_json_array::json_stream::JsonStream;
use serde_json::Value;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = File::open("huge.json")?;
    let reader = BufReader::new(file);
    let stream = JsonStream::new(reader);

    for value in stream {
        match value {
            Ok(json_value) => println!("{:?}", json_value),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

You can also run the examples, provided in the /examples directory:

Generate a large json file first:

    cargo run --example generate_large_json  

Then you can find all the Johns in the json (with --release flag it will run faster):

    cargo run --release --example find_johns users_5.0_gb.json    

โš™๏ธ How It Works

The core struct is:

pub struct JsonStream<R: Read> {
    ...
}

It implements Iterator<Item = Result<Value, JsonError>> and maintains:

  • A buffer for reading chunks
  • A character accumulator for partial objects
  • Bracket and string context state

Parsing logic handles array delimiters, quoted strings, escape sequences, and ensures objects are emitted only when fully formed.

๐Ÿ“ฅ Installation

In your Cargo.toml:

[dependencies]
large-json-array = "0.1.0"

โ— Limitations

  • Assumes well-formed JSON arrays of objects (e.g., [ {...}, {...}, ... ])
  • Strings must be UTF-8 encoded
  • Not suitable for deeply malformed JSON (for now)

๐Ÿ“„ License

This project is licensed under the terms of the MIT license.


For more advanced use cases or improvements (e.g., error recovery, typed deserialization, or multi-threaded parsing), contributions and ideas are welcome!

Commit count: 36

cargo fmt