# rpz `rpz` consists of a binary crate and [library crate](https://docs.rs/rpz/latest/rpz). The binary crate, `rpz`, is an application that downloads, parses, and transforms ad-(un)block files from URLs and local file paths into a [response policy zone (RPZ)](https://en.wikipedia.org/wiki/Response_policy_zone) file. This RPZ file can be consumed by a DNS server that supports such files (e.g., [Unbound](https://nlnetlabs.nl/projects/unbound/about/)). ## rpz in action In this example it is assumed [`unbound.conf(5)`](https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html) is properly configured and has `name` and `zonefile` in the `rpz` section set to `.` and `/var/unbound/db/rpz` respectively in addition to `control-enable` set to `true` in the `remote-control` section. ```bash [zack@laptop ~]$ cat</usr/local/etc/rpz/config > timeout = 15 > rpz = "/var/unbound/db/rpz" > local_dir = "/usr/local/etc/rpz/" > adblock = [ "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers.txt", "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers_firstparty.txt", "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt", "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/mobile.txt", "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers.txt", "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers_firstparty.txt", "https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_adservers.txt", "https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_thirdparty.txt", "https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_thirdparty.txt", "https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_trackingservers.txt", "https://malware-filter.gitlab.io/malware-filter/urlhaus-filter-agh.txt" ] domain = ["https://www.stopforumspam.com/downloads/toxic_domains_whole.txt"] hosts = ["https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt", "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts"] wildcard = ["https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblock&showintro=0&mimetype=plaintext"] > EOF [zack@laptop ~]$ cat /usr/local/etc/rpz/unblock/domain/unbound dpm.demdex.net # ESPN app on PS5 needs this. [zack@laptop ~]$ rpz -f /usr/local/etc/rpz/config unblock count written: 1 block count written: 271559 total lines written: 271560 domains parsed: 254147 comments parsed: 6629 blanks parsed: 4519 parsing errors: 24624 [zack@laptop ~]$ head -1 /var/unbound/db/rpz dpm.demdex.net CNAME rpz-passthru. [zack@laptop ~]$ tail -6 /var/unbound/db/rpz stats.zone-telechargement CNAME . *.stats.zone-telechargement CNAME . 5wh.co.zw CNAME . www.5wh.co.zw CNAME . pandi.co.zw CNAME . www.pandi.co.zw CNAME . [zack@laptop ~]$ unbound-control -q auth_zone_reload . && unbound-control -q flush_zone . && unbound-control -q flush_negative ``` ## Ad-(un)block file format and encoding All ad-(un)block files must be valid UTF-8; however for a given domain, each label must only contain 1–63 Unicode scalar values from the set: `!`, `$`, `&`, `'`, `(`, `)`, `+`, `,`, `-`, `0`–`9`, `;`, `=`, `_`, `` ` ``, `A`–`Z`, `a`–`z`, `{`, `}`, and `~`. Labels must be delimited by `.`. Domains in the file must be delimited by a line feed or carriage return and line feed. A domain must be less than 254 characters in length including the `.` label separator. Domains are treated as case-insensitive with uppercase letters treated as lowercase. Domains must not be an IPv4 address. ### Adblock-style Domain constructed from an [Adblock-style rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#adblock-style-syntax) with the requirement that the rule conforms to the following extended regex: `^*(\|\|)?**\^?*$` where `` conforms to a valid [`Domain`](https://docs.rs/ascii_domain/latest/ascii_domain/dom/struct.Domain.html) based on [`ASCII_FIREFOX`](https://docs.rs/ascii_domain/latest/ascii_domain/char_set/constant.ASCII_FIREFOX.html) with the added requirements that the TLD is either all letters or at least length five and begins with `xn--` and does not contain `$`, and `` is any sequence of [ASCII whitespace](https://infra.spec.whatwg.org/#ascii-whitespace). Lines that begin with `||` cause all subdomains to be blocked (i.e., the domain itself and all proper subdomains); without `||`, only the specific domain is blocked. Due to the conservative nature in how these files are processed, one is encouraged to still use an application-level ad blocker (e.g., [uBlock Origin](https://ublockorigin.com/)). Adblock-style files often contain paths as well as additional information (e.g., “third-party”) that require application-level information to process correctly as such entries will be considered “parsing errors” by `rpz`. ### Domain-style Domain constructed from a [domains-only rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#domains-only-syntax) with the requirement that the rule conforms to the following regex: `^**(#.*)?$` where `` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, and `` is any sequence of ASCII whitespace. Domains only represent themselves (i.e., proper subdomains will not be blocked). ### Hosts-style Domain constructed from a [`hosts(5)`-style rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#etc-hosts-syntax) with the requirement that the rule conforms to the following extended regex: `^*+*(#.*)?$` where `` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, `` is any sequence of ASCII whitespace, and `` is one of the following: `::`, `::1`, `0.0.0.0`, or `127.0.0.1`. Domains only represent themselves (i.e., proper subdomains will not be blocked). ### Wildcard-style Domain constructed from a [wildcard domain rule](https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblock&showintro=0&mimetype=plaintext) with the requirement that the rule conforms to the following extended regex: `^*(\*\.)?*(#.*)?$` where `` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, and `` is any sequence of ASCII whitespace. If `domain` begins with `*.`, then `domain` must have length less than 252 and all proper subdomains are blocked—this does _not_ include the domain itself; otherwise, only the `domain` is blocked. ## Config file Either `-` or the absolute path to the TOML config file must be passed via the `-f`/`--file` CLI option. If `-` is passed, then `stdin` will be read. The format of this file must conform to the following: ```bash timeout = rpz = local_dir = adblock = [] domain = [] hosts = [] wildcard = [] ``` If `rpz` does not exist, then the file will be written to `stdout`. If `local_dir` is specified, `block/` and `unblock/` subdirectories are searched; and for each of those subdirectories, `adblock/`, `domain/`, `hosts/`, and `wildcard/` subdirectories are searched for files which are parsed according to the directory they are in. It is not an error if any of the directories do not exist. In the event keys are specified corresponding to arrays, URLs must be unique across all arrays. The files these URLs point to are interpreted as block files (i.e., unblock files are only allowed on the local file system). The `timeout` corresponds to the maximum _seconds_ allowed for an HTTP(S) file to be downloaded. If it does not exist or has a value of 0, then a timeout of one hour will be used. If the value specified exceeds one hour, then it will be truncated to one hour. ## RPZ file Unless `stdout` is the destination, a temporary RPZ file is written in the same location as the `rpz` value in the config file except with `tmp` appended to the name. Upon success, this file is renamed to the `rpz` value in the config file. The contents of this file contain the minimum number of lines possible with unblock entries taking precedence over block entries. In the event there are no block entries or the temp file already exists, the program will abort. ## Options When `rpz` is passed `-V`/`--version`, the version of `rpz` will be printed to `stdout`. When passed `-h`/`--help`, information about the program and its options will be printed to `stdout`. When passed `-f`/`--file` along with `-` or the absolute path to the TOML config file, `rpz` will run normally printing summary information to `stdout` upon completion. One can additionally pass `-q`/`--quiet` along with `-f`/`--file` in order to suppress summary information from being printed to `stdout`. When `-v`/`--verbose` is passed along with `-f`/`--file`, in addition to the normal summary information being printed to `stdout`, itemized summary information for each input file including the kinds of errors and counts of errors will be printed to `stdout`. ### Example If `www.example.com`, `*.example.com`, and `foo.com` are to be blocked while `foo.example.com` and `||foo.com` are to be unblocked, the RPZ file would look like the following: ```bash foo.example.com CNAME rpz-passthru. *.example.com CNAME . ``` Upon success, the quantity of unblock, block, and total lines written is written to `stdout` in addition to the total number of domains, comments, blanks, and parsing errors. ## Errors Parsing errors are ignored; all other errors are written to `stderr` before program abortion. ## License Licensed under either of * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0). * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT). at your option. ## Contribution Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions. ### Status This package is actively maintained. The crates are only tested on the `x86_64-unknown-linux-gnu` and `x86_64-unknown-openbsd` targets, but they should work on platform. Nightly `rustc` is required. Once `BTreeMap` [cursors are stabilized](https://github.com/rust-lang/rust/issues/107540), stable `rustc` will work. On OpenBSD-stable, one can use the `rust` port as long as `RUSTC_BOOTSTRAP` is `export`ed with a value of `1` before invoking `cargo build --all-features --release` or `cargo install --all-features rpz`.