% ATTENTION: This file was automatically generated using cargo xtask.
% Do not manually edit this file!

# CLI Reference

This document contains the help content for the `warcat` command-line program.

## `warcat`

WARC archive tool

**Usage:** `warcat [OPTIONS] <COMMAND>`

###### **Subcommands:**

* `export` — Decodes a WARC file to messages in a easier-to-process format such as JSON
* `import` — Encodes a WARC file from messages in a format of the `export` subcommand
* `list` — Provides a listing of the WARC records
* `get` — Returns a single WARC record
* `extract` — Extracts resources for casual viewing of the WARC contents
* `verify` — Perform specification and integrity checks on WARC files
* `self` — Self-installer and uninstaller

###### **Options:**

* `-q`, `--quiet` — Disable any progress messages.

   Does not affect logging.
* `--log-level <LOG_LEVEL>` — Filter log messages by level

  Default value: `off`

  Possible values: `trace`, `debug`, `info`, `warn`, `error`, `off`

* `--log-file <LOG_FILE>` — Write log messages to the given file instead of standard error
* `--log-json` — Write log messages as JSON sequences instead of a console logging format


## `warcat export`

Decodes a WARC file to messages in a easier-to-process format such as JSON

**Usage:** `warcat export [OPTIONS]`

###### **Options:**

* `--input <INPUT>` — Path to a WARC file

  Default value: `-`
* `--compression <COMPRESSION>` — Specify the compression format of the input WARC file

  Default value: `auto`

  Possible values:
  - `auto`:
    Automatically detect the format by the filename extension
  - `none`:
    No compression
  - `gzip`:
    Gzip format (such as ".warc.gz" files)
  - `zstandard`:
    Zstandard format (such as ".warc.zst" files)

* `--output <OUTPUT>` — Path for the output messages

  Default value: `-`
* `--format <FORMAT>` — Format for the output messages

  Default value: `json-seq`

  Possible values:
  - `json-seq`:
    JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A)
  - `jsonl`:
    JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A)
  - `cbor-seq`:
    CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items

* `--no-block` — Do not output block messages
* `--extract` — Output extract messages


## `warcat import`

Encodes a WARC file from messages in a format of the `export` subcommand

**Usage:** `warcat import [OPTIONS]`

###### **Options:**

* `--input <INPUT>` — Path to the input messages

  Default value: `-`
* `--format <FORMAT>` — Format for the input messages

  Default value: `json-seq`

  Possible values:
  - `json-seq`:
    JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A)
  - `jsonl`:
    JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A)
  - `cbor-seq`:
    CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items

* `--output <OUTPUT>` — Path of the output WARC file

  Default value: `-`
* `--compression <COMPRESSION>` — Compression format of the output WARC file

  Default value: `auto`

  Possible values:
  - `auto`:
    Automatically detect the format by the filename extension
  - `none`:
    No compression
  - `gzip`:
    Gzip format (such as ".warc.gz" files)
  - `zstandard`:
    Zstandard format (such as ".warc.zst" files)

* `--compression-level <COMPRESSION_LEVEL>` — Level of compression for the output

  Default value: `high`

  Possible values:
  - `balanced`:
    A balance between compression ratio and resource consumption
  - `high`:
    Use a reasonably increased amount of resources to achieve a better compression ratio
  - `low`:
    Fast and low resource usage, but lower compression ratio


## `warcat list`

Provides a listing of the WARC records

**Usage:** `warcat list [OPTIONS]`

###### **Options:**

* `--input <INPUT>` — Path of the WARC file

  Default value: `-`
* `--compression <COMPRESSION>` — Compression format of the input WARC file

  Default value: `auto`

  Possible values:
  - `auto`:
    Automatically detect the format by the filename extension
  - `none`:
    No compression
  - `gzip`:
    Gzip format (such as ".warc.gz" files)
  - `zstandard`:
    Zstandard format (such as ".warc.zst" files)

* `--output <OUTPUT>` — Path to output listings

  Default value: `-`
* `--format <FORMAT>` — Format of the output

  Default value: `json-seq`

  Possible values:
  - `json-seq`:
    JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A)
  - `jsonl`:
    JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A)
  - `cbor-seq`:
    CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items
  - `csv`:
    Comma separated values

* `--field <FIELD>` — Fields to include in the listing.

   The option accepts names of fields that occur in a WARC header.

   The pseudo-name `:position` represents the position in the file. `:file` represents the path of the file.

  Default value: `:position,WARC-Record-ID,WARC-Type,Content-Type,WARC-Target-URI`


## `warcat get`

Returns a single WARC record

**Usage:** `warcat get <COMMAND>`

###### **Subcommands:**

* `export` — Output export messages
* `extract` — Extract a resource


## `warcat get export`

Output export messages

**Usage:** `warcat get export [OPTIONS] --position <POSITION> --id <ID>`

###### **Options:**

* `--input <INPUT>` — Path of the WARC file

  Default value: `-`
* `--compression <COMPRESSION>` — Compression format of the input WARC file

  Default value: `auto`

  Possible values:
  - `auto`:
    Automatically detect the format by the filename extension
  - `none`:
    No compression
  - `gzip`:
    Gzip format (such as ".warc.gz" files)
  - `zstandard`:
    Zstandard format (such as ".warc.zst" files)

* `--position <POSITION>` — Position where the record is located in the input WARC file
* `--id <ID>` — The ID of the record to extract
* `--output <OUTPUT>` — Path for the output messages

  Default value: `-`
* `--format <FORMAT>` — Format for the output messages

  Default value: `json-seq`

  Possible values:
  - `json-seq`:
    JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A)
  - `jsonl`:
    JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A)
  - `cbor-seq`:
    CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items

* `--no-block` — Do not output block messages
* `--extract` — Output extract messages


## `warcat get extract`

Extract a resource

**Usage:** `warcat get extract [OPTIONS] --position <POSITION> --id <ID>`

###### **Options:**

* `--input <INPUT>`

  Default value: `-`
* `--compression <COMPRESSION>` — Compression format of the input WARC file

  Default value: `auto`

  Possible values:
  - `auto`:
    Automatically detect the format by the filename extension
  - `none`:
    No compression
  - `gzip`:
    Gzip format (such as ".warc.gz" files)
  - `zstandard`:
    Zstandard format (such as ".warc.zst" files)

* `--position <POSITION>` — Position where the record is located in the input WARC file
* `--id <ID>` — The ID of the record to extract
* `--output <OUTPUT>` — Path for the output file

  Default value: `-`


## `warcat extract`

Extracts resources for casual viewing of the WARC contents.

Files are extracted to a directory structure similar to the archived URL.

This operation does not automatically permit offline viewing of archived websites; no content conversion or link-rewriting is performed.

**Usage:** `warcat extract [OPTIONS]`

###### **Options:**

* `--input <INPUT>` — Path to the WARC file

  Default value: `-`
* `--compression <COMPRESSION>` — Compression format of the input WARC file

  Default value: `auto`

  Possible values:
  - `auto`:
    Automatically detect the format by the filename extension
  - `none`:
    No compression
  - `gzip`:
    Gzip format (such as ".warc.gz" files)
  - `zstandard`:
    Zstandard format (such as ".warc.zst" files)

* `--output <OUTPUT>` — Path to the output directory

  Default value: `./`
* `--continue-on-error` — Whether to ignore errors
* `--include <INCLUDE>` — Select only records with a field.

   Rule format is "NAME" or "NAME:VALUE".
* `--include-pattern <INCLUDE_PATTERN>` — Select only records matching a regular expression.

   Rule format is "NAME:VALUEPATTERN".
* `--exclude <EXCLUDE>` — Do not select records with a field.

   Rule format is "NAME" or "NAME:VALUE".
* `--exclude-pattern <EXCLUDE_PATTERN>` — Do not select records matching a regular expression.

   Rule format is "NAME:VALUEPATTERN".


## `warcat verify`

Perform specification and integrity checks on WARC files

**Usage:** `warcat verify [OPTIONS]`

###### **Options:**

* `--input <INPUT>` — Path to the WARC file

  Default value: `-`
* `--compression <COMPRESSION>` — Compression format of the input WARC file

  Default value: `auto`

  Possible values:
  - `auto`:
    Automatically detect the format by the filename extension
  - `none`:
    No compression
  - `gzip`:
    Gzip format (such as ".warc.gz" files)
  - `zstandard`:
    Zstandard format (such as ".warc.zst" files)

* `--output <OUTPUT>` — Path to output problems

  Default value: `-`
* `--format <FORMAT>` — Format of the output

  Default value: `json-seq`

  Possible values:
  - `json-seq`:
    JSON sequences (RFC 7464). Each message is a JSON object delimitated by a Record Separator (U+001E) and a Line Feed (U+000A)
  - `jsonl`:
    JSON Lines. Each message is a JSON object terminated by a Line Feed (U+000A)
  - `cbor-seq`:
    CBOR sequences (RFC 8742). Messages are a series of consecutive CBOR data items
  - `csv`:
    Comma separated values

* `--exclude-check <EXCLUDE_CHECK>` — Do not perform check

  Possible values: `mandatory-fields`, `known-record-type`, `content-type`, `concurrent-to`, `block-digest`, `payload-digest`, `ip-address`, `refers-to`, `refers-to-target-uri`, `refers-to-date`, `target-uri`, `truncated`, `warcinfo-id`, `filename`, `profile`, `segment`, `record-at-time-compression`

* `--database <DATABASE>` — Database filename for storing temporary intermediate data


## `warcat self`

Self-installer and uninstaller

**Usage:** `warcat self <COMMAND>`

###### **Subcommands:**

* `install` — Launch the interactive self-installer
* `uninstall` — Launch the interactive uninstaller


## `warcat self install`

Launch the interactive self-installer

**Usage:** `warcat self install [OPTIONS]`

###### **Options:**

* `--quiet` — Install automatically without user interaction


## `warcat self uninstall`

Launch the interactive uninstaller

**Usage:** `warcat self uninstall [OPTIONS]`

###### **Options:**

* `--quiet` — Uninstall automatically without user interaction