Crates.io | multi-machine-dedup |
lib.rs | multi-machine-dedup |
version | 0.2.0 |
source | src |
created_at | 2022-12-22 22:01:17.831357 |
updated_at | 2022-12-22 22:01:17.831357 |
description | Deduplication tool using SQLite to allow multi-machine features. |
homepage | https://github.com/newca12/multi-machine-dedup |
repository | https://github.com/newca12/multi-machine-dedup |
max_upload_size | |
id | 744140 |
size | 66,634 |
multi-machine-dedup is a deduplication tool using SQLite to allow multi-machine features.
multi-machine-dedup is an EDLA project.
The purpose of edla.org is to promote the state of the art in various domains.
cargo install multi-machine-dedup
Index recursively a directory <DIRECTORY_FULL_PATH> labelled with a <LABEL> in a SQLite database <SQLITE_FILE>
multi-machine-dedup index -l <LABEL> --db <SQLITE_FILE> <DIRECTORY_FULL_PATH>
Check a directory
multi-machine-dedup check-integrity -l <LABEL> --db <SQLITE_FILE>
Compare two databases
multi-machine-dedup compare --db1 <SQLITE_FILE_1> --db2 <SQLITE_FILE_2>
You can use a convenient database tool like DBeaver CE or SQLiteStudio to query the generated SQLite database.
Find top duplicates files larger than <A_SIZE>
select label, full_path, hash,size,nb_dup from file , (select hash, count(*) as nb_dup from file where size > <A_SIZE>
group by hash order by nb_dup DESC, size DESC) as T
where file.hash = T.hash order by nb_dup DESC, size DESC ;
Find all files with the same <CRC_VALUE>
select * from file where hash=<A_CRC_VALUE> ;
Find all files with image/jpeg MIME-type.
select * from hash where mime like "image/jpeg" ;
$Env:LOG='debug'; cargo run ...
remove-item Env:LOG
multi-machine-dedup <SUBCOMMAND> --help
or
multi-machine-dedup help <SUBCOMMAND>
Inspired by https://github.com/hgrecco/dedup multi-machine-dedup will probably propose similar features.
© 2022 Olivier ROLAND. Distributed under the GPLv3 License.