# Build postprocessor to reset metadata fields for build reproducibility This crate provides the program `add-determinism` that takes one or more paths as arguments, and will recursively process those paths, attempting to run the handlers on any files with extensions that match. (Each argument can be either a single file or a directory to be processed recursively.) For each processed file, a temporary file is opened, the contents are rewritten, the modification timestamp is copied from the original file to the temporary copy, and the copy is renamed over the original. If processing fails, a warning is emitted, but no modifications are made and the program returns success. The purpose of this tool is to elimiate common source of non-determinism in builds, making it easier to create reproducible (package) builds. ## Usage ### Standalone ```console $ add-determinism /path/to/file /path/to/directory ``` Note that the program works in-place, replacing input files with the rewritten versions (if any modifications are made). Some useful options: * `-v` — enable debug output * `-j [N]` — use `N` workers (or as many as CPUs, if `N` is not given) * `--handler list|HANDLER|-HANDLER` — constrain the list of handlers. Takes a comma-separated list of names, either a list of "positive" names, in which case only listed handlers will be used, or a list of "negative" names, each prefixed by minus, in which case the listed handlers will not be used. By default, handlers that cannot be initialized are skipped with a warning. If a "positive" list is given, failure to initialize a handler will cause an error. The special value `list` can be used to list known handlers. * `--brp` — enable "build root program" mode, see below. ### In an rpm build environment When invoked with `--brp`, the `$RPM_BUILD_ROOT` environment variable must be defined and not empty. All arguments must be below `$RPM_BUILD_ROOT`. This option is intended to be used in rpm macros that define post-install steps. See [redhat-rpm-config pull request #293](https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/293) for a pull request that added a call to `add-determinism` in `%__os_install_post`. ### Verification instead of modification When invoked with `--check`, the tool processes all files, but does not actually save any modifications. Instead, it'll fail if any files would have been modified. It also returns an error if any files cannot be read. ## Processors ### `ar` Accepts `*.a`. Resets the embedded modification times to `$SOURCE_DATE_EPOCH` and owner:group to 0:0. ### `jar` Accepts `*.jar`. This rewrites the zip file using the `zip` crate. The modification times of archive entries is clamped `$SOURCE_DATE_EPOCH`. Extra metadata, i.e. primarily timestamps in UNIX format and DOS permissions, are stripped (also because the crate does not support them). ### `javadoc` Accepts `*.html`. This looks at the `
` portion of an HTML file and finds standard lines inserted by Javadoc that specify the file creation date. For example, `` is replaced by a version without the version and date, and `` is replaced by a version with `$SOURCE_DATE_EPOCH`. ### `pyc` Accepts `*.pyc`. This handler implements a `.pyc` file parser for Python bytecode files and cleans up unused "flag references". It is a Rust reimplementation of the [MarshalParser Python module](https://github.com/fedora-python/marshalparser). ### `pyc-zero-mtime` Accepts `*.pyc`. This handler sets the internal timestamp in `.pyc` file header to 0, and sets the mtime on the corresponding source `.py` file to 0. This is intended to be used on [OSTree](https://github.com/ostreedev/ostree) systems where mtimes are discarded, causing a mismatch between the timestamp embedded in the `.pyc` file and the filesystem metadata of the `.py` file. This handler is not enabled by default and must be explicitly requested via `--handler pyc-zero-mtime`. ## Notes This project is inspired by [strip-nondeterminism](https://salsa.debian.org/reproducible-builds/strip-nondeterminism), but is written from scratch in Rust. For Debian, build tools are written in Perl and more Perl is not an issue. But in Fedora/RHEL/…, tools are written in Bash, Python, or compiled, and we don't want to pull in Perl into all buildroots. In addition, the details differ in what kinds of processing we want to do. For example, Debian does not distribute Python bytecode files.