I have to sync large files across some machines. The files can be up to 6GB in size. The sync will be done manually every few weeks. I cant take the filename into consideration because they can change anytime.
My plan is to create checksums on the destination PC and on the source PC and then copy all files with a checksum, which are not already in the destination, to the destination.
My first attempt was something like this:
using System.IO;
using System.Security.Cryptography;
private static string GetChecksum(string file)
{
using (FileStream stream = File.OpenRead(file))
{
SHA256Managed sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(stream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}
The Problem was the runtime:
– with SHA256 with a 1,6 GB File -> 20 minutes
– with MD5 with a 1,6 GB File -> 6.15 minutes
Is there a better – faster – way to get the checksum (maybe with a better hash function)?
The problem here is that
SHA256Managedreads 4096 bytes at a time (inherit fromFileStreamand overrideRead(byte[], int, int)to see how much it reads from the filestream), which is too small a buffer for disk IO.To speed things up (2 minutes for hashing 2 Gb file on my machine with SHA256, 1 minute for MD5) wrap
FileStreaminBufferedStreamand set reasonably-sized buffer size (I tried with ~1 Mb buffer):