Crates.io | integrity-checker |
lib.rs | integrity-checker |
version | 0.2.2 |
source | src |
created_at | 2018-07-01 20:40:15.975989 |
updated_at | 2023-02-13 20:40:27.036585 |
description | integrity checker for backups and filesystems |
homepage | https://github.com/elliottslaughter/integrity-checker |
repository | https://github.com/elliottslaughter/integrity-checker |
max_upload_size | |
id | 72480 |
size | 111,814 |
This tool is an integrity checker for backups and filesystems.
Given a directory, the tools constructs a database of metadata (hashes, sizes, timestamps, etc.) of the contents. The database itself is of course checksummed as well.
Given two databases (or a database and a directory) the tool iterates the entries and prints a helpful summary of the differences between them. For example, the tool highlights suspicious patterns, such as files which got truncated (had non-zero size, and now have zero size) or have other patterns that could indicate corruption (e.g. the presence of NUL bytes, if the file originally had none). Surfacing useful data while minimizing false positives is an ongoing effort.
Here are a couple sample use cases:
Backup integrity checking: Record a database when you make a backup. When restoring the backup, compare against the database to make sure the backup restore function has worked properly. (Or better, perform this check periodically to ensure that the backups are functioning properly.)
Continuous sync sanity checking: Suppose you use a tool like Dropbox. In theory, your files are "backed up" on a continuous basis. In practice, you have no assurance that the tool isn't modifying files behind your back. By recording databases periodically, you can sanity check that directories that shouldn't change often are in fact not changing. (Note: For this to be useful, the tool has to be very good at minimizing false positives.)
This also applies to any live filesystem. Consider that a typical user will maintain continuity of data across possibly decades of hardware and filesystem upgrades. Every transition is an opportunity for silent data corruption. This tool can provide peace of mind that integrity is preserved for long-lived data.
The tool is designed around an especially stable database format so that if something were to happen, it would be relatively straightforward to recover the contained metadata.
For users running macOS or Linux on x86(-64), run:
cargo install integrity-checker --features=asm
Other users run:
cargo install integrity-checker
The asm
feature enables an optimization in the sha2
crate which
makes the SHA2 hash implementation faster.
To build a database db.json.gz
from the directory at path
, run:
ick build db.json.gz path
There are several operations one can perform on a database. The following commands check a database against a directory, diff two databases, and validate a single database, respectively.
ick check db.json.gz path
ick diff db.json.gz db2.json.gz
ick selfcheck db.json.gz
See the format description.
Corpus: Linux 4.16.7 source (4403 directories, 62872 files, 890 MiB)
Machine: 2016 MacBook Pro 2.7 GHz Quad-Core i7
Configuration | Time (s) | BW (MiB/s) |
---|---|---|
No Hash | 0.8832 | 1007.7 |
SHA2-512/256 | 1.3128 | 677.9 |
Blake2b | 1.3034 | 682.8 |
SHA2-512/256 + Blake2b | 1.8119 | 491.2 |
Isn't this better served by existing tools? ZFS, Tarsnap, etc. should never corrupt your data.
Well, it depends. Not all users have access to a filesystem that checksums file contents, or to a machine with ECC RAM, and even the ones that do may experience filesystem bugs. In general, defense in depth is good, even with relatively trustworthy tools such as ZFS and Tarsnap. Also, in the continuous sync use case, even with backups, it can often be difficult to be assured that you haven't been subject to silent data corruption. This tool can be part of a larger toolkit for ensuring the validity of long-term storage.
-v
flag that shows verbose diffs