say i have a file A.doc.
then i copy it to b.doc and move it to another directory.
for me, it is still the same file.
but how can i determine that it is?
when i download files i sometimes read about getting the mda5 something or the checksum, but i don’t know what that is about.
Is there a way to check whether these files are binary equal?
If you want to be 100% sure of the exact bytes in the file being the same, then opening two streams and comparing each byte of the files is the only way.
If you just want to be pretty sure (99.9999%?), I would calculate a MD5 hash of each file and compare the hashes instead. Check out System.Security.Cryptography.MD5CryptoServiceProvider.In my testing, if the files are usually equivalent then comparing MD5 hashes is about three times faster than comparing each byte of the file.
If the files are usually different then comparing byte-by-byte will be much faster, because you don’t have to read in the whole file, you can stop as soon as a single byte differs.
Edit: I originally based this answer off a quick test which read from each file byte-by-byte, and compared them byte-by-byte. I falsely assumed that the buffered nature of the System.IO.FileStream would save me from worrying about hard disk block sizes and read speeds; this was not true. I retested my program that reads from each file in 4096 byte chunks and then compares the chunks – this method is slightly faster overall than MD5 even when the files are exactly the same, and will of course be much faster if they differ.
I’m leaving this answer as a mild warning about the FileStream class, and because I still thinkit has some value as an answer to “how do I calculate the MD5 of a file in .NET”. Apart from that though, it’s not the best way to fulfill the original request.
example of calculating the MD5 hashes of two files (now tested!):