| Crates.io | stardex |
| lib.rs | stardex |
| version | 0.1.2 |
| created_at | 2025-11-26 04:50:46.564012+00 |
| updated_at | 2025-11-26 08:57:27.466792+00 |
| description | A zero-trust, streaming tar parser + per-file hasher for backup pipelines. |
| homepage | https://github.com/tpet93/stardex |
| repository | https://github.com/tpet93/stardex |
| max_upload_size | |
| id | 1950956 |
| size | 95,613 |
Streaming Tar Index is a zero-trust, streaming tar parser and per-file hasher designed for backup pipelines.
It reads tar streams from stdin, emits per-file metadata and hashes to stdout as JSONL or other formats, and never modifies the stream. It is designed to be used with tee in a pipeline.
hash_algo + hash when hashing is performed; preserves non-UTF-8 paths via path_raw_b64.Once published:
cargo install stardex
git clone https://github.com/tpet93/stardex.git
cd stardex
cargo install --path .
tar -cf - my_directory | stardex > index.jsonl
Calculate hashes while compressing and writing to a file (or tape):
tar -cf - /data \
| tee >(stardex --algo blake3 > index.jsonl) \
| zstd -T0 > backup.tar.zst
Calculate per-file hashes, a global tar hash, and a compressed archive hash in one pass:
tar -cf - directory \
| tee >(stardex --algo sha256 --global-hash sha256 --summary-out summary.json > index.jsonl) \
| zstd -T0 \
| tee >(sha256sum > archive.tar.zst.sha256) \
> archive.tar.zst
This produces:
index.jsonl: Per-file metadata and SHA256 hashes.summary.json: Total tar size and SHA256 hash of the uncompressed tar stream.archive.tar.zst: The compressed archive.archive.tar.zst.sha256: SHA256 hash of the compressed archive.run the benchmark script to see how fast stardex can go on your system:
./tests/benchmark.sh
--algo <ALGO>: Hashing algorithm to use. Options: blake3 (default), sha256, md5, sha1, xxh64, xxh3, xxh128, none.--format <FORMAT>: Output format. Options: jsonl (default), csv, sql.--buffer-size <SIZE>: Set read buffer size (default: 2M). Supports human-readable units (e.g., 64K, 1M, 10M).--no-fail: Drain stdin on error instead of exiting (prevents broken pipes).--init-sql: When using --format sql, emit the schema and wrap inserts in BEGIN; ... COMMIT; so you can pipe directly into sqlite3 file.sqlite.stardex man and shell completion generation once the publishing bugs around those assets are resolved. Contributions welcome!Regular, GNUSparse, Continuous). Metadata-only entries are still validated and emitted without hashes. --algo none disables hashing entirely but leaves all metadata intact.STARDEX_PAX_MAX_SIZE env var can override). Malformed length fields or oversized headers fail fast. PAX overrides for path, size, mtime, and mode are reflected in the top-level fields.--no-fail drains stdin to EOF after an error to avoid breaking downstream pipes, and then exits with status 0 (so downstream tools stay running).{
"path": "my_directory/file.txt",
"path_is_utf8": true,
"path_raw_b64": null,
"file_type": "Regular",
"size": 1234,
"mode": 420,
"mtime": 1700000000,
"hash_algo": "blake3",
"hash": "...",
"pax": {
"path": "...",
"mtime": "..."
},
"offset": 0
}
path_raw_b64 is emitted when the tar entry name is not valid UTF-8, allowing lossless reconstruction without emitting tar bytes. CSV and SQL formats contain the same fields (SQL output is emitted as INSERT statements with proper escaping). offset is the byte offset of the entry header within the tar stream.
SQL column order: path, path_is_utf8, path_raw_b64, file_type, size, mode, mtime, hash_algo, hash, pax (JSON), offset.
tar -cf - /path/to/dir \
| stardex --format sql --init-sql \
| sqlite3 archive.sqlite
MIT