% ATTENTION: This file was automatically generated using cargo xtask. % Do not manually edit this file! # CLI Reference This document contains the help content for the `warcat` command-line program. ## `warcat` WARC archive tool **Usage:** `warcat [OPTIONS] ` ###### **Subcommands:** * `export` — Decodes a WARC file to messages in a easier-to-process format such as JSON * `import` — Encodes a WARC file from messages in a format of the `export` subcommand * `list` — Provides a listing of the WARC records * `get` — Returns a single WARC record * `extract` — Extracts resources for casual viewing of the WARC contents * `verify` — Perform specification and integrity checks on WARC files * `self` — Self-installer and uninstaller ###### **Options:** * `-q`, `--quiet` — Disable any progress messages. Does not affect logging. * `--log-level ` — Filter log messages by level Default value: `off` Possible values: `trace`, `debug`, `info`, `warn`, `error`, `off` * `--log-file ` — Write log messages to the given file instead of standard error * `--log-json` — Write log messages as JSON sequences instead of a console logging format ## `warcat export` Decodes a WARC file to messages in a easier-to-process format such as JSON **Usage:** `warcat export [OPTIONS]` ###### **Options:** * `--input ` — Path to a WARC file Default value: `-` * `--compression ` — Specify the compression format of the input WARC file Default value: `auto` Possible values: - `auto`: Automatically detect the format by the filename extension - `none`: No compression - `gzip`: Gzip format (such as ".warc.gz" files) - `zstandard`: Zstandard format (such as ".warc.zst" files) * `--output ` — Path for the output messages Default value: `-` * `--format ` — Format for the output messages Default value: `json-seq` Possible values: - `json-seq`: JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A) - `jsonl`: JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A) - `cbor-seq`: CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items * `--no-block` — Do not output block messages * `--extract` — Output extract messages ## `warcat import` Encodes a WARC file from messages in a format of the `export` subcommand **Usage:** `warcat import [OPTIONS]` ###### **Options:** * `--input ` — Path to the input messages Default value: `-` * `--format ` — Format for the input messages Default value: `json-seq` Possible values: - `json-seq`: JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A) - `jsonl`: JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A) - `cbor-seq`: CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items * `--output ` — Path of the output WARC file Default value: `-` * `--compression ` — Compression format of the output WARC file Default value: `auto` Possible values: - `auto`: Automatically detect the format by the filename extension - `none`: No compression - `gzip`: Gzip format (such as ".warc.gz" files) - `zstandard`: Zstandard format (such as ".warc.zst" files) * `--compression-level ` — Level of compression for the output Default value: `high` Possible values: - `balanced`: A balance between compression ratio and resource consumption - `high`: Use a reasonably increased amount of resources to achieve a better compression ratio - `low`: Fast and low resource usage, but lower compression ratio ## `warcat list` Provides a listing of the WARC records **Usage:** `warcat list [OPTIONS]` ###### **Options:** * `--input ` — Path of the WARC file Default value: `-` * `--compression ` — Compression format of the input WARC file Default value: `auto` Possible values: - `auto`: Automatically detect the format by the filename extension - `none`: No compression - `gzip`: Gzip format (such as ".warc.gz" files) - `zstandard`: Zstandard format (such as ".warc.zst" files) * `--output ` — Path to output listings Default value: `-` * `--format ` — Format of the output Default value: `json-seq` Possible values: - `json-seq`: JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A) - `jsonl`: JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A) - `cbor-seq`: CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items - `csv`: Comma separated values * `--field ` — Fields to include in the listing. The option accepts names of fields that occur in a WARC header. The pseudo-name `:position` represents the position in the file. `:file` represents the path of the file. Default value: `:position,WARC-Record-ID,WARC-Type,Content-Type,WARC-Target-URI` ## `warcat get` Returns a single WARC record **Usage:** `warcat get ` ###### **Subcommands:** * `export` — Output export messages * `extract` — Extract a resource ## `warcat get export` Output export messages **Usage:** `warcat get export [OPTIONS] --position --id ` ###### **Options:** * `--input ` — Path of the WARC file Default value: `-` * `--compression ` — Compression format of the input WARC file Default value: `auto` Possible values: - `auto`: Automatically detect the format by the filename extension - `none`: No compression - `gzip`: Gzip format (such as ".warc.gz" files) - `zstandard`: Zstandard format (such as ".warc.zst" files) * `--position ` — Position where the record is located in the input WARC file * `--id ` — The ID of the record to extract * `--output ` — Path for the output messages Default value: `-` * `--format ` — Format for the output messages Default value: `json-seq` Possible values: - `json-seq`: JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A) - `jsonl`: JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A) - `cbor-seq`: CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items * `--no-block` — Do not output block messages * `--extract` — Output extract messages ## `warcat get extract` Extract a resource **Usage:** `warcat get extract [OPTIONS] --position --id ` ###### **Options:** * `--input ` Default value: `-` * `--compression ` — Compression format of the input WARC file Default value: `auto` Possible values: - `auto`: Automatically detect the format by the filename extension - `none`: No compression - `gzip`: Gzip format (such as ".warc.gz" files) - `zstandard`: Zstandard format (such as ".warc.zst" files) * `--position ` — Position where the record is located in the input WARC file * `--id ` — The ID of the record to extract * `--output ` — Path for the output file Default value: `-` ## `warcat extract` Extracts resources for casual viewing of the WARC contents. Files are extracted to a directory structure similar to the archived URL. This operation does not automatically permit offline viewing of archived websites; no content conversion or link-rewriting is performed. **Usage:** `warcat extract [OPTIONS]` ###### **Options:** * `--input ` — Path to the WARC file Default value: `-` * `--compression ` — Compression format of the input WARC file Default value: `auto` Possible values: - `auto`: Automatically detect the format by the filename extension - `none`: No compression - `gzip`: Gzip format (such as ".warc.gz" files) - `zstandard`: Zstandard format (such as ".warc.zst" files) * `--output ` — Path to the output directory Default value: `./` * `--continue-on-error` — Whether to ignore errors * `--include ` — Select only records with a field. Rule format is "NAME" or "NAME:VALUE". * `--include-pattern ` — Select only records matching a regular expression. Rule format is "NAME:VALUEPATTERN". * `--exclude ` — Do not select records with a field. Rule format is "NAME" or "NAME:VALUE". * `--exclude-pattern ` — Do not select records matching a regular expression. Rule format is "NAME:VALUEPATTERN". ## `warcat verify` Perform specification and integrity checks on WARC files **Usage:** `warcat verify [OPTIONS]` ###### **Options:** * `--input ` — Path to the WARC file Default value: `-` * `--compression ` — Compression format of the input WARC file Default value: `auto` Possible values: - `auto`: Automatically detect the format by the filename extension - `none`: No compression - `gzip`: Gzip format (such as ".warc.gz" files) - `zstandard`: Zstandard format (such as ".warc.zst" files) * `--output ` — Path to output problems Default value: `-` * `--format ` — Format of the output Default value: `json-seq` Possible values: - `json-seq`: JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A) - `jsonl`: JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A) - `cbor-seq`: CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items - `csv`: Comma separated values * `--exclude-check ` — Do not perform check Possible values: `mandatory-fields`, `known-record-type`, `content-type`, `concurrent-to`, `block-digest`, `payload-digest`, `ip-address`, `refers-to`, `refers-to-target-uri`, `refers-to-date`, `target-uri`, `truncated`, `warcinfo-id`, `filename`, `profile`, `segment`, `record-at-time-compression` * `--database ` — Database filename for storing temporary intermediate data ## `warcat self` Self-installer and uninstaller **Usage:** `warcat self ` ###### **Subcommands:** * `install` — Launch the interactive self-installer * `uninstall` — Launch the interactive uninstaller ## `warcat self install` Launch the interactive self-installer **Usage:** `warcat self install [OPTIONS]` ###### **Options:** * `--quiet` — Install automatically without user interaction ## `warcat self uninstall` Launch the interactive uninstaller **Usage:** `warcat self uninstall [OPTIONS]` ###### **Options:** * `--quiet` — Uninstall automatically without user interaction