I’m hashing a file with one or more hash algorithms. When I tried to parametrize which hash types I want, it got a lot messier than I was hoping.
I think I’m missing a chance to make better use of generics or LINQ. I also don’t like that I have to use a Type[] as the parameter instead of limiting it to a more specific set of type (HashAlgorithm descendants), I’d like to specify types as the parameter and let this method do the constructing, but maybe this would look better if I had the caller new-up instances of HashAlgorithm to pass in?
public List<string> ComputeMultipleHashesOnFile(string filename, Type[] hashClassTypes)
{
var hashClassInstances = new List<HashAlgorithm>();
var cryptoStreams = new List<CryptoStream>();
FileStream fs = File.OpenRead(filename);
Stream cryptoStream = fs;
foreach (var hashClassType in hashClassTypes)
{
object obj = Activator.CreateInstance(hashClassType);
var cs = new CryptoStream(cryptoStream, (HashAlgorithm)obj, CryptoStreamMode.Read);
hashClassInstances.Add((HashAlgorithm)obj);
cryptoStreams.Add(cs);
cryptoStream = cs;
}
CryptoStream cs1 = cryptoStreams.Last();
byte[] scratch = new byte[1 << 16];
int bytesRead;
do { bytesRead = cs1.Read(scratch, 0, scratch.Length); }
while (bytesRead > 0);
foreach (var stream in cryptoStreams)
{
stream.Close();
}
foreach (var hashClassInstance in hashClassInstances)
{
Console.WriteLine("{0} hash = {1}", hashClassInstance.ToString(), HexStr(hashClassInstance.Hash).ToLower());
}
}
Let’s start by breaking the problem down. Your requirement is that you need to compute several different kinds of hashes on the same file. Assume for the moment that you don’t need to actually instantiate the types. Start with a function that has them already instantiated:
That was easy. If the files might be large and you need to stream it (keeping in mind that this will be much more expensive in terms of I/O, just cheaper for memory), you can do that too, it’s just a little more verbose:
It’s a little gnarly and in the second case it’s highly questionable whether or not the Linq-ified version is any better than an ordinary
foreachloop, but hey, we’re having fun, right?Now that we’ve disentangled the hash-generation code, instantiating them first isn’t really that much more difficult. Again we’ll start with code that’s clean – code that uses delegates instead of types:
Now this is much nicer, and the benefit is that it allows instantiation of the algorithms within the method, but doesn’t require it. We can invoke it like so:
If we really, really, desperately need to start from the actual
Typeinstances, which I’d try not to do because it breaks compile-time type checking, then we can do that as the last step:And that’s it. Now we can run this (bad) code:
At the end of the day, this seems like more code but it’s only because we’ve composed the solution effectively in a way that’s easy to test and maintain. If we wanted to do this all in a single Linq expression, we could:
That’s all there really is to it. I’ve skipped the delegated “selector” step in this final version because if you’re writing this all as one function you don’t need the intermediate step; the reason for having it as a separate function earlier is to give as much flexibility as possible while still maintaining compile-time type safety. Here we’ve sort of thrown it away to get the benefit of terser code.
Edit: I will add one thing, which is that although this code looks prettier, it actually leaks the unmanaged resources used by the
HashAlgorithmdescendants. You really need to do something like this instead:And again we’re kind of losing clarity here. It might be better to just construct the instances first, then iterate through them with
foreachandyield returnthe hash strings. But you asked for a Linq solution, so there you are. 😉