I am aware that MD5 hashes are not advisable for security any more but I have been using them as a checksum to make sure that a file has not been corrupted after a download/transfer, which I thought was still ok. Though after using this method on a file bigger than a gigabyte, I found that the stored and generated hashes did not match. This was after I had transferred it from one computer to another via a USB stick. I’ve searched online and found a couple of references to large files possibly creating inconsistent hashes but I didn’t see anything conclusive.
I am using ComputeHash(Stream inputStream) of MD5CryptoServiceProvider to create the hash before and after transfer, so it should not be a case of the byte format being messed up between different languages or something. I also tried building the hash from the file again and the second time it seemed to create matching hashes fine. Did I just get unlucky and actually end up corrupting the file after copying it on and off the USB stick? Or is this a known problem with MD5 and I should ditch it completely? If so, what would be the best replacement that would ideally also be available as standard in C#, is SHA1 the next best option?
The MD5 hash of some data will be exactly the same as a second MD5 hash of exactly the same data, regardless of the size of that data. The only problem with MD5 for large files is that, in some cases, you might get the same hash for two different files. This is ludicrously unlikely, though.
The same thing will apply for SHA1 and any other hash algorithm, though, since you’re converting a large data-space down into a small hash-space.
It sounds significantly more likely that corruption occurred during the transfer, either on the USB bus or the flash device itself.