Are there any famous algorithms to efficiently find duplicates?
For e.g. Suppose if I have thousands of photos and the photos are named with unique names. There could be chances that duplicate could exist in different sub-folders. Is using std::map or any other hash-maps is a good idea?
If your dealing with files, one idea is to first verify the file’s lenght, and then generate a hash just for the files that have the same size.
Then just compare the file’s hashes. If they’re the same, you’ve got a duplicate file.
There’s a tradeoff between safety and accuracy: there might happen, who knows, to have different files with the same hash. So you can improve your solution: generate a simple, fast hash to find the dups. When they’re different, you have different files. When they’re equal, generate a second hash. If the second hash is different, you just had a false positive. If they’re equal again, probably you have a real duplicate.
In other words:
Doing a hash for every file will take too much time and will be useless if most of your files are different.