| Crates.io | json2bin |
| lib.rs | json2bin |
| version | 0.2.1 |
| created_at | 2024-07-11 00:21:45.813503+00 |
| updated_at | 2025-03-13 23:48:08.419928+00 |
| description | A fast jsonl to RWKV binidx converter in Rust |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1298926 |
| size | 224,589 |
A fast multithreading Jsonl converter to RWKV binidx files written in Rust.

$ cargo install json2bin
$ json2bin -h
Json converter to RWKV binidx file format
Usage: json2bin [OPTIONS] --input <INPUT>
Options:
-i, --input <INPUT> Jsonlines file to read
-o, --output-dir <OUTPUT_DIR> Output directory for binidx files [default: -]
-t, --thread <THREAD> Number of threads [default: 8]
-v, --verbose Verbosity
-h, --help Print help
-V, --version Print version
Following command will convert the jsonl file src/sample.jsonl into src/sample.bin and src/sample.idx files.
$ json2bin -i src/sample.jsonl
The output directory can be set with the argument "--output-dir <OUTPUT_DIR>" or "-o <OUTPUT_DIR>"
$ json2bin -i src/sample.jsonl -o output
The default threads number is 8, it can be changed with the argument "--thread" or "-t"
$ json2bin -i src/sample.jsonl -t 4
We converted a 19GB English Wikipedia (20231101.en) in jsonl format to binidx format in M2 Apple machine. The Rust json2bin run with 7 threads, and it was 70 times faster than the Python json2binidx: