duplicate_file_finder

Crates.ioduplicate_file_finder
lib.rsduplicate_file_finder
version0.1.5
created_at2025-07-07 21:55:32.733153+00
updated_at2025-07-11 23:08:08.061172+00
descriptionFinds duplicate files.
homepagehttps://github.com/Andrewsimsd/duplicate-file-finder
repositoryhttps://github.com/Andrewsimsd/duplicate-file-finder
max_upload_size
id1741823
size1,941,226
Andrew Sims (Andrewsimsd)

documentation

README

Duplicate File Finder

Crates.io Documentation CI License GitHub

A fast, parallelized CLI tool and library for detecting duplicate files by content. Designed for efficiency, usability, and cross-platform compatibility.

Features

  • Recursively scans directories for duplicate files
  • Detects duplicates using a multi-stage strategy:
    • Group by file size
    • Compare quick hash (first 8 KB using twox-hash)
    • Validate full content with SHA-256
  • Generates detailed reports with metadata and potential space savings
  • Supports progress indicators and structured logging
  • Multithreaded using rayon for high performance
  • Usable as both a CLI tool and a Rust library

Installation

Add to your project:

[dependencies]
duplicate_file_finder = "0.1"

Or install the CLI binary:

cargo install duplicate_file_finder

Usage

Command Line

duplicate_file_finder [--output <file_or_directory>]
duplicate_file_finder <directory> [--output <file_or_directory>]
duplicate_file_finder --directories <dir1> <dir2> ... [--output <file_or_directory>]

Example

duplicate_file_finder ~/Documents --output reports/

This scans ~/Documents and writes a human-readable report to reports/duplicate_file_report.txt.

Running duplicate_file_finder with no arguments scans the directory it is executed from and saves duplicate_file_report.txt in that same directory.

Options

Option Description
-h, --help Show help message
--output <path> Specify output file or directory for the report
-d, --directories <DIR> Scan multiple directories as a single pool

If the output path is a directory, the report is saved as duplicate_file_report.txt within that directory.

Sample Output

Duplicate File Finder Report
Generated by: alice
Start Time: 20250707 15:00:00
End Time:   20250707 15:00:42
Base Directory: /home/alice/Documents

Total Potential Space Savings: 1.43 GB

Size: 143.21 MB
/home/alice/Documents/archive/copy1.iso
/home/alice/Documents/archive/copy2.iso

Library Usage

You can also integrate the crate into your own Rust projects:

use duplicate_file_finder::{find_duplicates, write_output, setup_logger};
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    setup_logger()?;
    let base_dir = Path::new("/some/path");
    let duplicates = find_duplicates(base_dir);
    write_output(duplicates, "report.txt", "20250707 15:00:00", &[base_dir.to_path_buf()])?;
    Ok(())
}

Logging

Logs are written to duplicate_finder.log and include timestamps and severity levels.

Platform Support

  • Linux
  • macOS
  • Windows

Performance

The tool is optimized for performance using:

  • Parallel iteration via rayon
  • Incremental filtering (size → quick hash → full hash)
  • Efficient I/O with buffered reading

Development

Running Tests

cargo test

Building the Binary

cargo build --release

License

This project is licensed under the MIT License. See LICENSE for details.

Contributing

Contributions, issues, and feature requests are welcome!

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/awesome)
  3. Commit your changes (git commit -am 'Add awesome feature')
  4. Push to the branch (git push origin feature/awesome)
  5. Create a new Pull Request

Made with ❤️ by Andrew Sims

Commit count: 0

cargo fmt