Crates.io | backup-deduplicator |
lib.rs | backup-deduplicator |
version | 0.3.0 |
source | src |
created_at | 2024-03-30 12:34:13.216004 |
updated_at | 2024-04-11 18:05:35.816819 |
description | A tool to deduplicate backups. It builds a hash tree of all files and folders in the target directory. Optionally also traversing into archives like zip or tar files. The hash tree is then used to find duplicate files and folders. |
homepage | https://github.com/0xCCF4/BackupDeduplicator |
repository | https://github.com/0xCCF4/BackupDeduplicator |
max_upload_size | |
id | 1191038 |
size | 217,964 |
A tool to deduplicate backups. It builds a hash tree of all files and folders in a target directory. Optionally also traversing into archives like zip or tar files (feature in development). The hash tree is then used to find duplicate files and folders. The output is a minimal duplicated set. Therefore, the tool discovers entire duplicated folder structures and not just single files.
Backup Deduplicator solves the problem of having multiple backups of the same data, whereas some parts of the data are duplicated. Duplicates can be reviewed and removed to save disk space (feature in development).
The tool is a command line tool. There are two stages: build
and analyze
.
backup-deduplicator build [OPTIONS] <target>
. The hash tree is saved to
disk and is used by the next stage.backup-deduplicator analyze [OPTIONS]
. The tool will output a list of
duplicated structures to an analysis result file.Exemplary usage to build a hash tree of a directory:
backup-deduplicator
--threads 16
build
--working-directory /parent
--output /parent/hash.bdd
/parent/target
This will build a hash tree of the directory /path/to/parent/target
and save it to
hash.bdd
in the parent directory. The tool will use 16 threads to split the hash
calculation work.
Exemplary usage to analyze a hash tree:
backup-deduplicator
analyze
--output /parent/analysis.bdd
/parent/hash.bdd
This will analyze the hash tree in hash.bdd
and save the analysis result to analysis.bdd
.
The analysis file will then contain a list of JSON objects (one per line),
each representing a found duplicated structure.
Further processing with this tool is in development.
The tool is written in Rust, and can be installed using cargo
:
cargo install backup-deduplicator
Precompiled binaries are available for download on the release page https://github.com/0xCCF4/BackupDeduplicator/releases.
The tool uses the rust features flags to enable or disable certain features. The following flags are available:
hash-sha1
: Use the sha1 module to enable SHA1 hash functionhash-sha2
: Use the sha2 module to enable SHA512, SHA256 hash functionshash-xxh
: Use the xxhash-rust module to enable XXH3 (32/64) hash functionsContributions to the project are welcome! If you have a feature request, bug report, or want to contribute to the code, please open an issue or a pull request.
This project is licensed under the GPLv3 license. See the LICENSE file for details.