I’m trying to design a simple application to be used for calculating a file’s CRC32/md5/sha1/sha256/sha384/sha512, and I’ve run into a bit of a roadblock. This is being done in C#.
I would like to be able to do this as efficiently as possible, so my original thought was to read the file into a memorystream first before processing, but I soon found out that very large files cause me to run out of memory very quickly. So it would seem that I have to use a filestream instead. The problem, as I see it, is that only one hash function can be run at a time, and doing so with a filestream will take a while for each hash to complete.
How might I go about reading a small bit of a file into memory, processing it with all 6 algorithms, and then going onto another chunk… Or does hashing not work that way?
This was my original attempt at reading a file into memory. It failed when I tried to read a CD image into memory prior to running the hashing algorithms on the memorystream:
private void ReadToEndOfFile(string filename)
{
if (File.Exists(filename))
{
FileInfo fi = new FileInfo(filename);
FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read);
byte[] buffer = new byte[16 * 1024];
//double step = Math.Floor((double)fi.Length / (double)100);
this.toolStripStatusLabel1.Text = "Reading File...";
this.toolStripProgressBar1.Maximum = (int)(fs.Length / buffer.Length);
this.toolStripProgressBar1.Value = 0;
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = fs.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
this.toolStripProgressBar1.Value += 1;
}
_ms = ms;
}
}
}
You’re most of the way there, you just don’t need to read the whole thing into memory at once.
All of the hashes in .Net derive from the HashAlgorithm class. This has two methods on it:
TransformBlockandTransformFinalBlock. So, you should be able to read a chunk for your file, stuff it into the TransformBlock method of whichever hashes you want to use, and then move into the next block. Just remember to callTransformFinalBlockfor your last chunk from the file, as that is what gets you the byte array containing the hash.For now, I would just do each hash one at a time, until it’s working, then worry about running the hashes concurrently (using something like the Task Parallel Library)