Crates.io | dupdup |
lib.rs | dupdup |
version | 0.3.0 |
source | src |
created_at | 2017-05-17 16:24:38.921022 |
updated_at | 2022-08-08 12:02:18.29082 |
description | Find duplicate file |
homepage | https://github.com/padenot/dupdup |
repository | https://github.com/padenot/dupdup |
max_upload_size | |
id | 14941 |
size | 20,262 |
dupdup
Suite of python 2.7 programs to solve the following problems:
This happens when you buy a NAS to backup, and then you backup multiple machine like an animal without having decided on a sensible archival strategy, so you have multiple copies of everything, but you have some files that are exclusive to each machine.
dupdup.py
This program hashes all the files under the specified directory, and finds the
duplicates, With -o file.json
, it writes a report in JSON format for further
analysis.
It's multi-pass, because the disk is slow: first pass is to hash the first 4k of the each file, second pass is to completely hash the files that are "possibly dupes", to make sure they are really dupes.
dupdup.py ../some_directory some/directory -o output_file.json
It also has a slightly out of date version in dupdup-rs
.
{
hash1 : ["duplicated file 1a",
"duplicated file 1b",
...],
hash2 : ["duplicated file 2a",
"duplicated file 2b",
...]
}
merge.py
Given source directory A and destination directory B, tries to find all files in A that are not in B, skipping the files that are in both.
This generates a shell script, full of mv
and mkdir -p
commands, that is to
be inspected, and then run.
Again, this is not based on the name of the files, but on their content.
merge.py -i source_directory -o destination_directory -f merge_script.sh
dupdup.html
A web page that can be open directly without server, and helps deleting dupes.
It accepts a JSON file generated by dupdup.py
, and displays each dupe tuple on
a line.
One can then click on the file to keep amongst all the copies, and also shift-click to select a column: clic on an item, press shift, press on another item to select a range.
Once a good number of files have been picked, clicking on the export script
button generates a shell script to inspect and then to copy to the remote
machine, to delete all the files that:
i.e., it will not touch files that have no element picked on their line.