| Crates.io | catfish |
| lib.rs | catfish |
| version | 0.1.1 |
| created_at | 2024-12-29 21:52:29.21502+00 |
| updated_at | 2024-12-29 21:57:20.089954+00 |
| description | catfish is a CLI tool that compares two directories by hashing all files. It reports which files are in the 'right' folder but not in 'left', regardless of how things were moved or renamed. Great for making sure your 'left' folder has all the files from the 'right' one. |
| homepage | |
| repository | https://github.com/samvdst/catfish |
| max_upload_size | |
| id | 1498544 |
| size | 20,844 |
Because sometimes files pretend to be something they’re not.
(No matter the name or location, catfish will find out if they’re the same.)
I needed this functionality today and threw this tool together. I decided to share it here, in case someone else finds it useful!
catfish recursively scans two folders—let’s call them “left” and “right”—and hashes every file (using SHA256).
Backstory:
Some time ago, I switched from cloud provider X to cloud provider Y. I had both drives fully synced locally (a full copy, not a "lite" sync), so I copied all my files from X to Y and then turned off sync for X. But I forgot to delete the local X folder, and ended up adding new files to it by mistake. When I went to delete it, a simple path comparison with Y wasn't enough because I'd moved and renamed files in Y, which would have caused a lot of false positives. What I really needed was to find out which files in X didn't exist anywhere in Y - so I could copy them over if necessary, and then safely delete X without losing anything important.
cargo install catfish
or clone this repo and:
git clone https://github.com/samvdst/catfish.git
cd catfish
cargo install --path .
catfish from anywhere.catfish [OPTIONS] <LEFT_FOLDER> <RIGHT_FOLDER>
-i, --ignore-duplicates: if there are multiple files with the same hash in RIGHT_FOLDER, only list the first occurrence.Suppose we have two folders: foo (left) and bar (right). In bar, we have a file that appears twice with identical content.
catfish foo bar
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example_dupe.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt
If we then run:
catfish foo bar --ignore-duplicates
Files in "bar" but not in "foo":
f2d30353acf140ed51b1343368255c1201a7ee898acd60b25e207ff75555e12c bar/example.txt
af89f7d49b0c8ded732a9a2b3aff738cd1a3c1cd0d3635742adfee47faa31cba bar/another_file.txt
Ideas, improvements, and pull requests are always welcome. But please note: I can’t guarantee that I’ll have much time to work on this. So if you open a PR, thanks in advance for your patience!