Possible Duplicate:
Probability of SHA1 collisions
Let’s say I’m trying to identify duplicate files in a file system. Would it be safe to say that if the files’ SHA1 checksums match, that they’re identical? Should I also look through their contents if they match?
I’ve read that the theoretical complexity of attack is 2^51 hash function calls. I’ve also read on SO that “For SHA1, which outputs 160 bits, the birthday attack reduces the complexity to 2^80. This should be safe for 30 years or more.” Should I still double check to make sure the file contents match? I jast want to make sure my assignment won’t produce an erroneous output when it’s run under a test script.
There’s a 1 in 2^160 chance that two given messages have the same hash (since SHA-1 produces a 160-bit hash).
Even if you have a million entries in your fileSystem, that’s still a 1 in 10^42 chance that a new entry will share the same hash.
SHA-1 has proved to be fairly good, so I don’t think you need to worry about collisions at all. If you need more you can add some quality attributes like a timestamp, filesize ..