Replies: 1 comment
-
It would be super useful in finding similar code files too. @qarmin (GREAT RUST PROJECT btw!): |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My issue is that I have previous versions of files, they have either very similar names or very similar file size, but not exactly the same. Hash also doesn't find them due to the discrepancies.
A solution to this would be to allow a user defined "margin of error" when using name and size options, For example, a name might be 95% similar (ignoring the extension, which should be identical), or files could be grouped together as long as they fall in a 10% margin of error in terms of size. The problem with margin of error in either of those cases is that it would result in a huge number of false positives, making it quite useless.
However, if you were to combine both options, you would get something more reliable. Take for example two files:
Super Special Project.odt (3 MB)
The Super Special Project Final.odt (3.5 MB)
The Super Special Project Final Final.odt (4.0 MB)
It is very clear for a human that they are all versions of the same file made at different dates, but not to czkawka. You could also argue that they aren't really the same file, but I disagree, they are just different versions, and if the content is incremental it is safe to remove the older version. Also, if you put it in different folders or spread it across your computer it becomes increasingly difficult to manually clean those duplicates, so a tool like czkawka is needed
Beta Was this translation helpful? Give feedback.
All reactions