Crates.io | find_duplicate_files |
lib.rs | find_duplicate_files |
version | 0.28.0 |
source | src |
created_at | 2023-07-26 00:49:17.187799 |
updated_at | 2024-04-11 13:00:31.047929 |
description | find identical files according to their size and hashing algorithm |
homepage | https://github.com/claudiofsr/find_duplicate_files |
repository | https://github.com/claudiofsr/find_duplicate_files |
max_upload_size | |
id | 926110 |
size | 118,206 |
This project has been renamed to: find-identical-files
.
old project name: find_duplicate_files
Find identical files according to their size and hashing algorithm.
"A hash function is a mathematical algorithm that takes an input (in this case, a file) and produces a fixed-size string of characters, known as a hash value or checksum. The hash value acts as a summary representation of the original input. This hash value is unique (disregarding unlikely collisions) to the input data, meaning even a slight change in the input will result in a completely different hash value."
Hash algorithm options are:
find_duplicate_files just reads the files and never changes their contents. See the function fn open_file() to verify.
find_duplicate_files
find_duplicate_files -n 5
With the --min_number (or -n) argument option, set the 'minimum number of identical files'.
With the --max_number (or -N) argument option, set the 'maximum number of identical files'.
If n = 0 or n = 1, all files will be reported.
If n = 2 (default), look for duplicate files or more identical files.
fxhash
algorithm and yaml
format:find_duplicate_files -twa fxhash -r yaml
Downloads
directory and redirect the output to a json
file for further analysis:find_duplicate_files -vi ~/Downloads -r json > fdf.json
find_duplicate_files -b 8
find_duplicate_files -B 1024
find_duplicate_files -b 8 -B 1024
find_duplicate_files -b 1024 -B 1024
8.1 The CSV file will be saved in the currenty directory:
find_duplicate_files -c .
8.2 The CSV file will be saved in the /tmp
directory:
find_duplicate_files --csv_dir=/tmp
9.1 The XLSX file will be saved in the ~/Downloads
directory:
find_duplicate_files -x ~/Downloads
9.2 The XLSX file will be saved in the /tmp
directory:
find_duplicate_files --xlsx_dir=/tmp
Downloads
directory and export the result to /tmp/fdf.xlsx
with the ahash
algorithm:find_duplicate_files -twi ~/Downloads -x /tmp -a ahash
Type in the terminal find_duplicate_files -h
to see the help messages and all available options:
find identical files according to their size and hashing algorithm
Usage: find_duplicate_files [OPTIONS]
Options:
-a, --algorithm <ALGORITHM>
Choose the hash algorithm [default: blake3] [possible values: ahash, blake3, fxhash, sha256, sha512]
-b, --min_size <MIN_SIZE>
Set a minimum file size (in bytes) to search for duplicate files
-B, --max_size <MAX_SIZE>
Set a maximum file size (in bytes) to search for duplicate files
-c, --csv_dir <CSV_DIR>
Set the output directory for the CSV file (fdf.csv)
-d, --min_depth <MIN_DEPTH>
Set the minimum depth to search for duplicate files
-D, --max_depth <MAX_DEPTH>
Set the maximum depth to search for duplicate files
-f, --full_path
Prints full path of duplicate files, otherwise relative path
-g, --generate <GENERATOR>
If provided, outputs the completion file for given shell [possible values: bash, elvish, fish, powershell, zsh]
-i, --input_dir <INPUT_DIR>
Set the input directory where to search for duplicate files [default: current directory]
-n, --min_number <MIN_NUMBER>
Minimum 'number of identical files' to be reported
-N, --max_number <MAX_NUMBER>
Maximum 'number of identical files' to be reported
-o, --omit_hidden
Omit hidden files (starts with '.'), otherwise search all files
-r, --result_format <RESULT_FORMAT>
Print the result in the chosen format [default: personal] [possible values: json, yaml, personal]
-s, --sort
Sort result by number of duplicate files, otherwise sort by file size
-t, --time
Show total execution time
-v, --verbose
Show intermediate runtime messages
-w, --wipe_terminal
Wipe (Clear) the terminal screen before listing the duplicate files
-x, --xlsx_dir <XLSX_DIR>
Set the output directory for the XLSX file (fdf.xlsx)
-h, --help
Print help (see more with '--help')
-V, --version
Print version
To build and install from source, run the following command:
cargo install find_duplicate_files
Another option is to install from github:
cargo install --git https://github.com/claudiofsr/find_duplicate_files.git
In general, jwalk (default) is faster than walkdir.
But if you prefer to use walkdir:
cargo install --features walkdir find_duplicate_files