Crates.io | wall-a |
lib.rs | wall-a |
version | 0.1.2 |
source | src |
created_at | 2024-08-08 07:15:31.783367 |
updated_at | 2024-08-08 17:14:59.802463 |
description | CLI tool for recording JSON in a compressed format |
homepage | |
repository | https://github.com/declanvk/wall-a |
max_upload_size | |
id | 1329202 |
size | 93,289 |
The wall-a
CLI tool is intended to support writing data into a binary format
in the context of a git repository.
My initial idea was writing benchmark/profiling data to the git repository, and storing it in a format that would not cause issues for git (mostly).
I wanted to write 1 piece of data per commit, and then wanted some way to get the aggregate data out so that I could maybe visualize or otherwise use the benchmark data.
The tool has two commands:
append
- this command will read JSON data from STDIN and append it to a staging
file in a specified "data" directory. If the staging file grows too large,
then the contents of the staging file are read, merged together, and then written
as in a binary format (CBOR) to a new "archive" file. The archive file has a
timestamp as part of the filename, so it is ordered with respect to all previous
archive files.read
- this command reads all the archive files in order by filename, merges
the values each contains, then reads and merges the staging file values as well.
Then it takes the final value and writes it to standard output.Important to note that the JSON data written by append
is merged with all previous
data when it is read
. The merge function works like:
{"key": "value1", "some":"other"}
and {"key": "value2", "un":"related"}
gives
{"key": "value2", "some":"other", "un":"related"}
.[1, 2, 3]
and [4, 5, 6]
gives [1, 2, 3, 4, 5, 6]
.The design is somewhat inspired by https://simonwillison.net/2020/Oct/9/git-scraping/,
I wanted to have git diff
work for the most recent data. However, I didn't want there
to be a huge JSONL file that grew without bound, so as a compromise I added the
idea of the "archive" file.
The "archive" file is just a snapshot of the staging file data, converted to a binary format. This binary file can be much smaller and faster to read than the staging file. The downside is that this file is in binary and doesn't interact with git well. The archive file are only written 1 time, to reduce the number of copies of the file git needs to store in the history.
The staging file is just a newline-delimited JSON file (JSONL). This format is great
for git diff
, since you can easily see the newly added data and the data which was
transferred to the archive file.