# JSN

_A queryable, streaming, JSON pull-parser with low allocation overhead._

- **Pull parser?**: The parser is implemented as an iterator that emits tokens
- **Streaming?**: The JSON document being parsed is never fully loaded into
  memory. It is read & validated byte by byte. This makes it ideal for dealing
  with large JSON documents
- **Queryable?** You can configure the parser to only emit & allocate tokens for
  the parts of the input you are interested in.

JSON is expected to conform to
[RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259). However,
[newline-delimited JSON](https://github.com/ndjson/ndjson-spec) and
[concatenated json](https://en.wikipedia.org/wiki/JSON_streaming#Concatenated_JSON)
formats are also supported.

Input can come from any source that implements the `Read` trait (e.g. a file,
byte slice, network socket etc..)

## Basic Usage

```rust
use jsn::{TokenReader, mask::*, Format};
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "nicknames": [ "joe" ],
            "phone": {
                "carrier": "Verizon",
                "numbers": [ "+44 1234567", "+44 2345678" ]
            }
        }
        {
            "name": "Jane Doe",
            "age": 32,
            "nicknames": [ "J" ],
            "phone": {
                "carrier": "AT&T",
                "numbers": ["+33 38339"]
            }
        }
    "#;

    let mask = key("numbers").and(index(0))
        .or(key("name"))
        .or(key("age"));
    let mut iter = TokenReader::new(data.as_bytes())
        .with_mask(mask)
        .with_format(Format::Concatenated)
        .into_iter();

    assert_eq!(iter.next().unwrap()?, "John Doe");
    assert_eq!(iter.next().unwrap()?, 43);
    assert_eq!(iter.next().unwrap()?, "+44 1234567");
    assert_eq!(iter.next().unwrap()?, "Jane Doe");
    assert_eq!(iter.next().unwrap()?, 32);
    assert_eq!(iter.next().unwrap()?, "+33 38339");
    assert_eq!(iter.next(), None);

    Ok(())
}
```

## Quick Explanation

Like traditional streaming parsers, the parser emits JSON tokens. The twist is
that you can query them in a "fun" way. The best analogy is
[bitmasks](https://stackoverflow.com/questions/10493411/what-is-bit-masking).

If you can use a bitwise `AND` to extract a bit pattern:

```text
input   : 0101 0101
AND
bitmask : 0000 1111
=
pattern : 0000 0101
```

Why can't you use a bitwise `AND` to extract a JSON token pattern?

```text
input     : { "hello": { "name" : "world" } }
AND
json mask : {something that extracts a "hello" key}
=
pattern   : _ ________ { "name" : "world" } _
```

That `{something that extracts a "hello" key}` is what this crate provides.

## Memory Footprint

`jsn` allows you to select the parts of your JSON that are of interest. What you
do with those parts and how long you keep them in memory is up to you.

To illustrate this, I'll use the Valgrind DHAT tool to profile the heap memory
usage of two similar programs. Both programs read & extract keys from a JSON
file. I'll be using the sf-city-lots json file (189 MB) from
[here](https://raw.githubusercontent.com/zemirco/sf-city-lots-json/33c27c137784a96d0fbd7f329dceda6cc7f49fa3/citylots.json).

- `examples/store-tokens.rs`: This program keeps the extracted tokens in a Vec
- `examples/print-tokens.rs`: This program prints the tokens as they are
  encountered

```shell
valgrind --tool=dhat ./target/profiling/examples/store-tokens ~/downloads/citylots.json
# ==1146722== Total:     13,823,524 bytes in 196,541 blocks
# ==1146722== At t-gmax: 7,529,044 bytes in 196,515 blocks
```

```shell
valgrind --tool=dhat ./target/profiling/examples/print-tokens ~/downloads/citylots.json
# ==1152944== Total:     1,240,708 bytes in 196,524 blocks
# ==1152944== At t-gmax: 9,367 bytes in 9 blocks
```

The first number (Total) is the total amount of heap memory that was allocated
by the program during its execution.

The second number (At t-gmax) is the maximum amount of allocated memory at any
one time during execution

Unsurprisingly, `store-tokens.rs` has a higher footprint. Yet, the crate's
utility is still obvious because the total memory allocated (13 MB) is still an
order of magnitude less than the size of the file (189 MB).

Things get better when you can operate immediately on tokens as they are yielded
(i.e. you do not accumulate them). Not only do you allocate less in total, but
your footprint is much much smaller. `print-tokens.rs` ripped through the file
while using at most 7KB of heap memory at any one time.