Crates.io | dupefinder |
lib.rs | dupefinder |
version | 0.2.0 |
source | src |
created_at | 2023-12-30 14:29:41.130746 |
updated_at | 2024-01-01 00:37:18.717195 |
description | A duplicate file finding utility library that supports directory recursion, multiple directories, and specific file duplicate searching. |
homepage | https://github.com/vgo0/dupefinder |
repository | https://github.com/vgo0/dupefinder |
max_upload_size | |
id | 1084246 |
size | 51,853 |
dupefinder
is a utility for finding duplicate files within
a set of folders. The contents of each folder are evaluated against
all other provided folders. This means if file 'a.jpg' in folder 'one'
also exists as 'b.jpg' in folder 'two' that will be considered a match.
This utility works by parsing file metadata within the provided folders and grouping together all files with the same size in bytes. Once sizes with multiple file entries are located, the file contents are hashed via XXH3 / xxHash and compared to the hash of other same-size files.
If only a single file of a certain size is found that file is not read and is skipped. This does read the entire file contents from disk while generating the hash.
Hashing makes use of a BufReader to incrementally read large files which should prevent having to read the entirety of a file into memory at once to generate the hash.
If a matching hash is found, a duplicate file has been found and will be returned.
Matching can be run more than once on a single DupeChecker
via .run()
, this is a full re-check
of all folders with the assumption file contents may have changed not just the presence of files.
Matching will actively skip (continue) past problems. Warnings are emitted via the log
crate
when such problems arise but they are otherwise not reported. Due to the support for multiple directories
and large file quantities stopping on a specific error was not desired.
There is an additional .run_for_file()
mode that will only search for duplicates of a specific file.
cargo add dupefinder
https://crates.io/crates/dupefinder
https://docs.rs/dupefinder/latest/dupefinder/
let directories = vec![String::from("./resources")];
let mut checker = dupefinder::DupeFinder::new(directories);
let results = checker.run();
for key in results.keys() {
let result = results.get(key);
if let Some(details) = result {
println!("{} files of size {} bytes found with hash {}", details.files.len(), details.size, details.hash);
for file in details.files.iter() {
println!("{}", file);
}
}
}
let directories = vec![String::from("./resources")];
let mut checker = dupefinder::DupeFinder::new_recursive(directories);
let results = checker.run();
for key in results.keys() {
let result = results.get(key);
if let Some(details) = result {
println!("{} files of size {} bytes found with hash {}", details.files.len(), details.size, details.hash);
for file in details.files.iter() {
println!("{}", file);
}
}
}
let directories = vec![String::from("./resources")];
let mut checker = dupefinder::DupeFinder::new(directories);
let results = checker.run_for_file(String::from("./test.txt"));
if let Ok(results) = results {
match results {
Some(duplicate) => {
println!("{} files found", duplicate.files.len());
},
None => {
println!("no matching files found");
},
}
};