Crates.io | duplicate_destroyer |
lib.rs | duplicate_destroyer |
version | 0.0.8 |
source | src |
created_at | 2023-02-28 23:10:14.90007 |
updated_at | 2023-12-13 13:32:16.748296 |
description | Finds and annihilates duplicate directories. |
homepage | |
repository | https://github.com/jm-fn/duplicate-destroyer |
max_upload_size | |
id | 797577 |
size | 191,560 |
Command line tool that finds duplicate directories and provides their basic handling.
Have you ever backed up a backup folder of a backup folder? Have you then tried to deduplicate the tangled mess with conventional deduplicator only to find that you have to check 20 431 files manually? Then the DuDe is for you! DuDe finds the topmost duplicate folders in your filesystem and allows you to effortlessly get rid of all of your duplicates once and for all (or at least until the next backup...).
(Also this is a small project intended as a learning experience with Rust.)
On Linux with Rust 1.64 or higher install by running:
cargo install --features cli duplicate_destroyer
After the installation is finished, there will be dude
binary available.
I have so far tested the installation on Fedora 35+ and on Raspberry Pi OS Bullseye.
There may be a missing build dependency - cc
. To install the DuDe first run
apt install build-essential
and then build from source
cargo install --features cli duplicate_destroyer
Warning: The crate is still pretty new and there are some big changes to the API to be expected.
Scan a directory for duplicates
dude --path path/to/some/dir --path path/to/another/dir
Once the directory is scanned DuDe will print the duplicate groups found. E.g.:
Group 1/2
--------------------------------
0. "path/to/some/dir/some_dir/A"
1. "path/to/some/dir/other_dir/B"
--------------------------------
Size: 8kB
-----------
Select action and paths. (Or press Ctrl-C to exit program.)
[O]pen, Open [F]older, [D]elete, ReplaceWith[H]ardlink, ReplaceWith[S]oftlink, [N]othing
To act on the items found type the letter of action and file numbers. E.g.
O 0 1
will open both files.
D 0
will (upon confirmation) delete "path/to/dir/some_dir/A" in our example.
To configure the number of threads used in calculating checksums use the --jobs
flag:
dude --path path/to/some/dir --jobs 3
When using the DuDe with a modern CPU and an external HDD it is usually better to use only one thread (as is the default now), since the program then becomes IO-bound and the parallel access to multiple files from the HDD can reduce the read speed.
The minimum size of the duplicates returned can be specified with the --minimum-size
argument. Note however, that this will not significantly reduce the computation time, since the DuDe still gets the checksum of all the files that might have duplicates. This is done because even large directories might differ in some small files and by disregarding the small files completely we would run the risk of losing some small but important data.
DuDe can use these hashing algorithms for comparing files:
dude --path path/to/some/dir --algorithm "sha2-512"
Usage: dude [OPTIONS] --path <PATH>
Options:
-p, --path <PATH> Add path to be scanned
-m, --minimum-size <MINIMUM_SIZE> Minimum size of duplicates considered (can have a metric prefix) [default=100]
-j, --jobs <JOBS> Number of jobs that run simultaneously [default=0]
--json-file <FILE> Output the list of duplicates to a file in json format
--no-interactive Disable interactive duplicate handling
-a, --algorithm <ALGORITHM> Hash algorithm used to compare files [possible values: blake2, sha3-256, sha3-512]
-h, --help Print help
-V, --version Print version
If you do not like the user interface, you can write your own! The DuDe exposes a library with the core functionality. See the documentation here.