Update video duplicate finder and more #1425
Open
+2,427
−1,033
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Performance comparison of using Array and Vector with specific buffer sizes for reading files from disk and calculating their hashes.
This is quite realistic scenario which also uses rayon which sometimes sometimes mess with predictability of results.
My computer have quite good CPU, but cheap Sata SSD, so results shows disk
(Time to read files and calculate hashes in parallel, smaller is better)
I tried to use locks to read at max 1 file from hdd, but there was no performance gains
There is also option to speedup resizing images, which causes significant speedup when checking for similar images
On my OS and hashed 91 files(~3/4 MB each, 3000x4000 JPGS) I got sometimes even 3x speedup(in real world, speedup should be smaller, depending on size of files(bigger should get bigger gains) and algorithm(nearest and blockhash almost not have any speedup because one is very simple to resize and other do not resize image before hashing ))