Crates.io | vid_dup_finder_lib |
lib.rs | vid_dup_finder_lib |
version | |
source | src |
created_at | 2021-10-30 14:33:25.662685 |
updated_at | 2024-12-08 15:48:47.072561 |
description | a library to find near-duplicate video files |
homepage | |
repository | https://github.com/Farmadupe/vid_dup_finder_app |
max_upload_size | |
id | 474401 |
Cargo.toml error: | TOML parse error at line 17, column 1 | 17 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
vid_dup_finder finds near-duplicate video files on disk. It detects videos whose frames look similar, and where the videos are roughly the same length (within ~5%).
vid_dup_finder will work with most common video file formats (any format supported by FFMPEG.)
Video Duplicate finder extracts several frames from the first minute of each video. It creates a "perceptual hash" from these frames using 'Spatial' and 'Temporal' information from those frames:
The resulting hashes can then be compared according to their hamming distance. Shorter distances represent similar videos.
Ffmpeg must be installed on your system and be accessible on the command line. You can do this by:
vid_dup_finder will find duplicates if minor changes have been made to the video, such as resizing, small colour corrections, small crops or faint watermarks. It will not find duplicates if there are larger changes (flipping or rotation, embedding in a corner of a different video etc)
To save processing time when working on large datasets, vid_dup_finder uses only frames from the first 30 seconds of any video. vid_dup_finder may return false positives when used on content of the same length and and a common first-30- seconds (for example a series of cartoons with a fixed into sequence)
Because this library only checks the first 30 seconds of each video, if two videos are the same length and share the first 30 seconds of video content, they will be reported as a false match. This may occur for TV shows which contain opening credits.
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.