| Crates.io | dedups |
| lib.rs | dedups |
| version | 0.0.25 |
| created_at | 2025-05-16 12:08:19.479153+00 |
| updated_at | 2025-05-16 17:28:39.547947+00 |
| description | A fast and efficient file deduplication tool with support for media files |
| homepage | |
| repository | https://github.com/AtlasPilotPuppy/dedup |
| max_upload_size | |
| id | 1676393 |
| size | 417,721 |
A high-performance duplicate file finder and manager written in Rust. dedups efficiently identifies duplicate files using parallel processing and provides both a command-line interface and an interactive Terminal User Interface (TUI) for managing the results.
# Download and install the latest release
curl -sSL https://raw.githubusercontent.com/AtlasPilotPuppy/dedup/main/install.sh | bash
Or run this one-liner to install manually:
curl -sSL https://raw.githubusercontent.com/AtlasPilotPuppy/dedup/main/install.sh > install.sh && chmod +x install.sh && ./install.sh
The script will:
/usr/local/bin (or ~/.local/bin if no sudo access)cargo install dedup
# Clone the repository
git clone https://github.com/AtlasPilotPuppy/dedup
cd dedup
# Build in release mode
cargo build --release
# The binary will be available at target/release/dedup
When using dedups on Windows, please note the following limitations:
Path Length: Windows has a default path length limit of 260 characters. While dedups can handle longer paths, you may need to enable long path support in Windows:
git config --system core.longpaths true if using Git\\?\ prefix for paths longer than 260 charactersFile Permissions: Windows file permissions are more restrictive than Unix-like systems:
Media Processing: Media deduplication on Windows requires:
Performance: Windows performance may be slightly lower than on Unix-like systems due to:
Configuration: The configuration file location is different:
C:\Users\<username>\.deduprc# Find duplicates in the current directory using the TUI
dedups -i
# Find duplicates in a specific directory
dedups /path/to/directory
# Find and delete duplicates (non-interactive)
dedups /path/to/directory --delete --mode newest_modified
# Use a custom config file
dedups /path/to/directory --config-file /path/to/my-config.toml
# Copy missing files from source to target directory
dedups /source/directory /target/directory
# Explicitly specify a target directory (can be useful with multiple source directories)
dedups /source/dir1 /source/dir2 --target /target/directory
# Deduplicate between directories and copy missing files
dedups /source/directory /target/directory --deduplicate
# Find duplicates in both source and target (without copying)
# and save the results to a file
dedups /source/directory /target/directory --deduplicate -o duplicates.json
# Copy missing files from multiple source directories to a target
dedups /source/dir1 /source/dir2 /source/dir3 /target/directory
# First deduplicate the target, then copy unique files from source
# (run as separate commands)
dedups /target/directory --delete --mode newest_modified
dedups /source/directory /target/directory
The media deduplication feature can detect similar images, videos, and audio files even when they have different formats, resolutions, or quality levels.
# Enable media deduplication mode
dedups /path/to/media --media-mode
# Set resolution preference (highest, lowest, or custom resolution)
dedups /path/to/media --media-mode --media-resolution highest
dedups /path/to/media --media-mode --media-resolution lowest
dedups /path/to/media --media-mode --media-resolution 1280x720
# Set format preferences (comma-separated, highest priority first)
dedups /path/to/media --media-mode --media-formats raw,png,jpg
# Adjust similarity threshold (0-100, default: 90)
dedups /path/to/media --media-mode --media-similarity 85
Professional Photography:
dedups /path/to/photos --media-mode --media-resolution highest --media-formats raw,tiff,png,jpg
Web/Mobile Optimization:
dedups /path/to/images --media-mode --media-resolution 1920x1080 --media-formats webp,jpg,png
Audio Collection:
dedups /path/to/audio --media-mode --media-formats flac,mp3,ogg
A sample script is included to demonstrate the media deduplication features. The script downloads small media files and creates variations with different formats, resolutions, and quality levels.
# Make the script executable
chmod +x sample_media.sh
# Run the script to create sample media files
./sample_media.sh
# Test media deduplication on the sample files (interactive mode)
dedups -i demo --media-mode
# For CLI mode with specific options
dedups --dry-run demo --media-mode --media-resolution highest --media-formats png,jpg,mp4
The script creates the following directory structure:
demo/
├── original # Original media files
├── similar_quality # Same media with different quality levels
├── different_formats # Same media in different file formats
└── resized # Same media with different resolutions
Dependencies for the sample script:
# Find and list duplicates only
dedups /path/to/photos
# Find and immediately delete duplicates, keeping newest files
dedups /path/to/photos --delete --mode newest_modified
# Move duplicates to a separate folder instead of deleting
dedups /path/to/photos --move-to /path/to/duplicates --mode shortest_path
# Export a report of duplicates for review
dedups /path/to/photos -o duplicates.json
# Use file caching for faster repeated scans
dedups /path/to/photos --cache-location ~/.dedup_cache --fast-mode
# Scenario 1: Safely copy missing files from source to target
dedups /source/photos /target/backup
# Scenario 2: Full synchronization with deduplication
# Step 1: Clean duplicates in the target directory
dedups /target/backup --delete --mode newest_modified
# Step 2: Clean duplicates in the source directory
dedups /source/photos --delete --mode newest_modified
# Step 3: Copy missing files from source to target
dedups /source/photos /target/backup
# Scenario 3: One-step operation to deduplicate between directories
dedups /source/photos /target/backup --deduplicate
# Scenario 4: Multiple source directories to one target
dedups /photos/2020 /photos/2021 /photos/2022 /backup/all_photos
USAGE:
dedups [OPTIONS] [directory]
ARGS:
<directory> The directory to scan for duplicate files [default: .]
OPTIONS:
-d, --delete Delete duplicate files automatically based on selection strategy
-M, --move-to <move-to> Move duplicate files to a specified directory
-l, --log Enable logging to a file (default: dedup.log)
--log-file <PATH> Specify a custom log file path
-o, --output <o> Output duplicate sets to a file (e.g., duplicates.json)
-f, --format <format> Format for the output file [json|toml] [default: json]
-a, --algorithm <algorithm> Hashing algorithm [md5|sha1|sha256|blake3|xxhash|gxhash|fnv1a|crc32] [default: xxhash]
-p, --parallel <parallel> Number of parallel threads for hashing (default: auto)
--mode <mode> Selection strategy for delete/move [newest_modified|oldest_modified|shortest_path|longest_path] [default: newest_modified]
-i, --interactive Run in interactive TUI mode
-v, --verbose... Verbosity level (-v, -vv, -vvv)
--include <include>... Include specific file patterns (glob)
--exclude <exclude>... Exclude specific file patterns (glob)
--filter-from <filter-from>
Load filter rules from a file (one pattern per line, # for comments)
--progress Show progress bar for CLI scan (TUI has its own progress display)
--sort-by <sort-by> Sort files by criterion [name|size|created|modified|path] [default: modifiedat]
--sort-order <sort-order>
Sort order [asc|desc] [default: descending]
--raw-sizes Display file sizes in raw bytes instead of human-readable format
--config-file <config-file>
Path to a custom config file
--dry-run Perform a dry run without making any actual changes
--cache-location <cache-location>
Directory to store file hash cache for faster rescans
--fast-mode Use cached file hashes when available (requires cache-location)
--media-mode Enable media deduplication for similar images/videos/audio
--media-resolution <resolution>
Preferred resolution for media files [highest|lowest|WIDTHxHEIGHT] [default: highest]
--media-formats <formats>
Preferred formats for media files (comma-separated, e.g., 'raw,png,jpg')
--media-similarity <threshold>
Similarity threshold percentage for media files (0-100) [default: 90]
-h, --help Print help information
-V, --version Print version information
When using --filter-from, the file should follow this format:
# This is a comment
+ *.jpg # Include all jpg files
- *tmp* # Exclude any path containing "tmp"
+ are include patterns- are exclude patterns# or ; are commentsThe TUI mode provides an interactive interface for exploring and managing duplicate sets.
The Settings screen (Ctrl+S) allows you to configure:
--cache-location to store file hashes on disk--fast-mode to skip hash calculations for unchanged filesdedups supports configuration through a .deduprc file in your home directory. This allows you to set default values that will be used when options are not explicitly specified on the command line.
The configuration file is located at:
~/.deduprcC:\Users\<username>\.deduprcYou can also specify a custom configuration file using the --config-file option:
dedups --config-file /path/to/my-config.toml /path/to/directory