Are there any famous algorithms to efficiently find duplicates? For e.g. Suppose if I

Question

0

Asked: May 23, 20262026-05-23T10:24:35+00:00 2026-05-23T10:24:35+00:00

Are there any famous algorithms to efficiently find duplicates? For e.g. Suppose if I

0

Are there any famous algorithms to efficiently find duplicates?

For e.g. Suppose if I have thousands of photos and the photos are named with unique names. There could be chances that duplicate could exist in different sub-folders. Is using std::map or any other hash-maps is a good idea?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T10:24:35+00:00

If your dealing with files, one idea is to first verify the file’s lenght, and then generate a hash just for the files that have the same size.

Then just compare the file’s hashes. If they’re the same, you’ve got a duplicate file.

There’s a tradeoff between safety and accuracy: there might happen, who knows, to have different files with the same hash. So you can improve your solution: generate a simple, fast hash to find the dups. When they’re different, you have different files. When they’re equal, generate a second hash. If the second hash is different, you just had a false positive. If they’re equal again, probably you have a real duplicate.

In other words:

generate file sizes
for each file, verify if there's some with the same size.
if you have any, then generate a fast hash for them.
compare the hashes.
If different, ignore.
If equal: generate a second hash.
Compare.
If different, ignore.
If equal, you have two identical files.

Doing a hash for every file will take too much time and will be useless if most of your files are different.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Are there any famous algorithms to efficiently find duplicates? For e.g. Suppose if I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply