dupefinder

Crates.iodupefinder
lib.rsdupefinder
version0.2.0
sourcesrc
created_at2023-12-30 14:29:41.130746
updated_at2024-01-01 00:37:18.717195
descriptionA duplicate file finding utility library that supports directory recursion, multiple directories, and specific file duplicate searching.
homepagehttps://github.com/vgo0/dupefinder
repositoryhttps://github.com/vgo0/dupefinder
max_upload_size
id1084246
size51,853
(vgo0)

documentation

README

Rust

Dupe Finder

dupefinder is a utility for finding duplicate files within a set of folders. The contents of each folder are evaluated against all other provided folders. This means if file 'a.jpg' in folder 'one' also exists as 'b.jpg' in folder 'two' that will be considered a match.

This utility works by parsing file metadata within the provided folders and grouping together all files with the same size in bytes. Once sizes with multiple file entries are located, the file contents are hashed via XXH3 / xxHash and compared to the hash of other same-size files.

If only a single file of a certain size is found that file is not read and is skipped. This does read the entire file contents from disk while generating the hash.

Hashing makes use of a BufReader to incrementally read large files which should prevent having to read the entirety of a file into memory at once to generate the hash.

If a matching hash is found, a duplicate file has been found and will be returned.

Matching can be run more than once on a single DupeChecker via .run(), this is a full re-check of all folders with the assumption file contents may have changed not just the presence of files.

Matching will actively skip (continue) past problems. Warnings are emitted via the log crate when such problems arise but they are otherwise not reported. Due to the support for multiple directories and large file quantities stopping on a specific error was not desired.

There is an additional .run_for_file() mode that will only search for duplicates of a specific file.

Install

cargo add dupefinder

https://crates.io/crates/dupefinder

Docs

https://docs.rs/dupefinder/latest/dupefinder/

Examples

Non-recursive

let directories = vec![String::from("./resources")];
let mut checker = dupefinder::DupeFinder::new(directories);
let results = checker.run();
for key in results.keys() {
    let result = results.get(key);
    if let Some(details) = result {
        println!("{} files of size {} bytes found with hash {}", details.files.len(), details.size, details.hash);
        for file in details.files.iter() {
            println!("{}", file);
        }
    }
}

Recursive subfolder search

let directories = vec![String::from("./resources")];
let mut checker = dupefinder::DupeFinder::new_recursive(directories);
let results = checker.run();

for key in results.keys() {
    let result = results.get(key);
    if let Some(details) = result {
        println!("{} files of size {} bytes found with hash {}", details.files.len(), details.size, details.hash);
        for file in details.files.iter() {
            println!("{}", file);
        }
    }
}

Specific file search

let directories = vec![String::from("./resources")];
let mut checker = dupefinder::DupeFinder::new(directories);
let results = checker.run_for_file(String::from("./test.txt"));

if let Ok(results) = results {
    match results {
       Some(duplicate) => {
           println!("{} files found", duplicate.files.len());
        },
       None => {
           println!("no matching files found");
       },
    }
};
Commit count: 12

cargo fmt