find_duplicate_files

Crates.io	find_duplicate_files
lib.rs	find_duplicate_files
version	0.28.0
source	src
created_at	2023-07-26 00:49:17.187799
updated_at	2024-04-11 13:00:31.047929
description	find identical files according to their size and hashing algorithm
homepage	https://github.com/claudiofsr/find_duplicate_files
repository	https://github.com/claudiofsr/find_duplicate_files
max_upload_size
id	926110
size	118,206

claudiofsr (claudiofsr)

documentation

https://docs.rs/find_duplicate_files

README

New project name

This project has been renamed to: find-identical-files.

old project name: find_duplicate_files

find_duplicate_files

Find identical files according to their size and hashing algorithm.

"A hash function is a mathematical algorithm that takes an input (in this case, a file) and produces a fixed-size string of characters, known as a hash value or checksum. The hash value acts as a summary representation of the original input. This hash value is unique (disregarding unlikely collisions) to the input data, meaning even a slight change in the input will result in a completely different hash value."

Hash algorithm options are:

ahash (used by hashbrown)
blake version 3 (default)
fxhash (used byFireFox and rustc)
sha256
sha512

find_duplicate_files just reads the files and never changes their contents. See the function fn open_file() to verify.

Usage examples

1. To find duplicate files in the current directory, run the command:

find_duplicate_files

2. Search files in current directory with at least 5 identical files, run the command:

find_duplicate_files -n 5

With the --min_number (or -n) argument option, set the 'minimum number of identical files'.

With the --max_number (or -N) argument option, set the 'maximum number of identical files'.

If n = 0 or n = 1, all files will be reported.

If n = 2 (default), look for duplicate files or more identical files.

3. To find duplicate files with `fxhash` algorithm and `yaml` format:

find_duplicate_files -twa fxhash -r yaml

4. To find duplicate files in the `Downloads` directory and redirect the output to a `json` file for further analysis:

find_duplicate_files -vi ~/Downloads -r json > fdf.json

5. To find duplicate files in the current directory whose size is greater than or equal to 8 bytes:

find_duplicate_files -b 8

6. To find duplicate files in the current directory whose size is less than or equal to 1024 bytes:

find_duplicate_files -B 1024

7. To find duplicate files in the current directory whose size is between 8 and 1024 bytes:

find_duplicate_files -b 8 -B 1024

8. To find duplicate files in the current directory whose size is exactly 1024 bytes:

find_duplicate_files -b 1024 -B 1024

9. Export duplicate file information from the current directory to an CSV file (fdf.csv).

8.1 The CSV file will be saved in the currenty directory:

find_duplicate_files -c .

8.2 The CSV file will be saved in the /tmp directory:

find_duplicate_files --csv_dir=/tmp

10. Export duplicate file information from the current directory to an XLSX file (fdf.xlsx).

9.1 The XLSX file will be saved in the ~/Downloads directory:

find_duplicate_files -x ~/Downloads

9.2 The XLSX file will be saved in the /tmp directory:

find_duplicate_files --xlsx_dir=/tmp

11. To find duplicate files in the `Downloads` directory and export the result to `/tmp/fdf.xlsx` with the `ahash` algorithm:

find_duplicate_files -twi ~/Downloads -x /tmp -a ahash

Help

Type in the terminal find_duplicate_files -h to see the help messages and all available options:

find identical files according to their size and hashing algorithm

Usage: find_duplicate_files [OPTIONS]

Options:
  -a, --algorithm <ALGORITHM>
          Choose the hash algorithm [default: blake3] [possible values: ahash, blake3, fxhash, sha256, sha512]
  -b, --min_size <MIN_SIZE>
          Set a minimum file size (in bytes) to search for duplicate files
  -B, --max_size <MAX_SIZE>
          Set a maximum file size (in bytes) to search for duplicate files
  -c, --csv_dir <CSV_DIR>
          Set the output directory for the CSV file (fdf.csv)
  -d, --min_depth <MIN_DEPTH>
          Set the minimum depth to search for duplicate files
  -D, --max_depth <MAX_DEPTH>
          Set the maximum depth to search for duplicate files
  -f, --full_path
          Prints full path of duplicate files, otherwise relative path
  -g, --generate <GENERATOR>
          If provided, outputs the completion file for given shell [possible values: bash, elvish, fish, powershell, zsh]
  -i, --input_dir <INPUT_DIR>
          Set the input directory where to search for duplicate files [default: current directory]
  -n, --min_number <MIN_NUMBER>
          Minimum 'number of identical files' to be reported
  -N, --max_number <MAX_NUMBER>
          Maximum 'number of identical files' to be reported
  -o, --omit_hidden
          Omit hidden files (starts with '.'), otherwise search all files
  -r, --result_format <RESULT_FORMAT>
          Print the result in the chosen format [default: personal] [possible values: json, yaml, personal]
  -s, --sort
          Sort result by number of duplicate files, otherwise sort by file size
  -t, --time
          Show total execution time
  -v, --verbose
          Show intermediate runtime messages
  -w, --wipe_terminal
          Wipe (Clear) the terminal screen before listing the duplicate files
  -x, --xlsx_dir <XLSX_DIR>
          Set the output directory for the XLSX file (fdf.xlsx)
  -h, --help
          Print help (see more with '--help')
  -V, --version
          Print version

Building

To build and install from source, run the following command:

cargo install find_duplicate_files

Another option is to install from github:

cargo install --git https://github.com/claudiofsr/find_duplicate_files.git

Mutually exclusive features

Walking a directory recursively: jwalk or walkdir.

In general, jwalk (default) is faster than walkdir.

But if you prefer to use walkdir:

cargo install --features walkdir find_duplicate_files

Commit count: 8

find_duplicate_files

documentation

README

New project name

find_duplicate_files

Usage examples

1. To find duplicate files in the current directory, run the command:

2. Search files in current directory with at least 5 identical files, run the command:

3. To find duplicate files with fxhash algorithm and yaml format:

4. To find duplicate files in the Downloads directory and redirect the output to a json file for further analysis:

5. To find duplicate files in the current directory whose size is greater than or equal to 8 bytes:

6. To find duplicate files in the current directory whose size is less than or equal to 1024 bytes:

7. To find duplicate files in the current directory whose size is between 8 and 1024 bytes:

8. To find duplicate files in the current directory whose size is exactly 1024 bytes:

9. Export duplicate file information from the current directory to an CSV file (fdf.csv).

10. Export duplicate file information from the current directory to an XLSX file (fdf.xlsx).

11. To find duplicate files in the Downloads directory and export the result to /tmp/fdf.xlsx with the ahash algorithm:

Help

Building

Mutually exclusive features

Walking a directory recursively: jwalk or walkdir.

cargo fmt

3. To find duplicate files with `fxhash` algorithm and `yaml` format:

4. To find duplicate files in the `Downloads` directory and redirect the output to a `json` file for further analysis:

11. To find duplicate files in the `Downloads` directory and export the result to `/tmp/fdf.xlsx` with the `ahash` algorithm: