| Crates.io | gzinspector |
| lib.rs | gzinspector |
| version | 0.2.4 |
| created_at | 2024-11-19 18:24:16.151325+00 |
| updated_at | 2024-11-19 18:45:32.106123+00 |
| description | A tool to inspect gzip/zlib compressed files (especially chunked textual files such as WARC, WET, WAT, CDX, ZipNum, etc.) |
| homepage | |
| repository | https://github.com/jt55401/gzinspector |
| max_upload_size | |
| id | 1453674 |
| size | 51,547 |
A robust command-line tool for inspecting and analyzing GZIP/ZLIB compressed files. GZInspector provides detailed information about compression chunks, headers, and content previews with support for both human-readable and JSON output formats.
Most GZIP implementations discard chunk boundaries during decompression since they're typically irrelevant for the decompressed output. However, certain file formats leverage GZIP chunks as a core feature, allowing selective decompression of individual chunks when their byte offsets and lengths are known.
This chunked compression approach is particularly prevalent in web archiving formats, including:
These formats are actively used by major web archiving initiatives like CommonCrawl and the Internet Archive to manage and provide access to petabyte-scale web archives.
cargo install gzinspector
To install the pre-built binary for Linux:
# Download the binary
# Download latest release from:
# https://github.com/jt55401/gzinspector/releases/latest
wget $(curl -s https://api.github.com/repos/jt55401/gzinspector/releases/latest | grep "browser_download_url.*tar\.gz" | cut -d '"' -f 4)
# Or browse all releases at:
# https://github.com/jt55401/gzinspector/releases
# Extract the binary
tar -xzf gzinspector-linux-x86_64.tar.gz
# Move the binary to a directory in your PATH
sudo mv gzinspector /usr/local/bin/
To install GZInspector from source, you'll need Rust and Cargo installed on your system. Then:
# Clone the repository
git clone https://github.com/jt55401/gzinspector.git
# Build the project
cd gzinspector
cargo build --release
# The binary will be available at target/release/gzinspector
gzinspector [OPTIONS] <FILE>
-o, --output-format <FORMAT>: Output format (human or json) [default: human]-p, --preview <PREVIEW>: Preview content (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3 lines)-c, --chunks <CHUNKS>: Only show first and last N chunks (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3)-e, --encoding <ENCODING>: Encoding for preview [default: utf-8]-h, --help: Display help information-V, --version: Display version informationBasic file inspection:
gzinspector example.gz
Show JSON output:
gzinspector -o json example.gz
Preview content (first 5 lines and last 3 lines):
gzinspector -p 5:3 example.gz
The human-readable output includes:
đĻ #1 â đ 0 â đ 2.5x â đĨ 1.2KB â đ¤ 3.0KB â âšī¸ deflate|EXTRA|NAME|example.txt
Where:
JSON output provides detailed information in a machine-readable format:
{
"chunk_number": 1,
"offset": 0,
"compressed_size": 1234,
"uncompressed_size": 3000,
"compression_ratio": 2.43,
"header_info": "deflate|EXTRA|NAME|example.txt"
}
Both output formats include a summary showing:
flate2: GZIP/ZLIB compression libraryserde: Serialization frameworkclap: Command line argument parsingchrono: Date and time functionalitycrc32fast: CRC32 checksum calculationcargo build --releaseContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Jason Grey (jason@jason-grey.com)
0.1.0: Initial release
0.2.0: Chunks release