I’m working on a specialized on-disk hashtable (prior experiments with Berkeley, ManagedESENT, etc. didn’t pan out). It has a fairly simple chained structure, with each key-value pair (KVP) followed in the file by a long (Int64) value that points to the next KVP in the chain (and uses a value of zero if there isn’t one). I’m using MD5 to generate the hash code.
When profiling the code to assess the speed of adding entries, the hash function is responsible for about 55% of the running time, which isn’t totally surprising. But about 25% of that time is coming from the binForm.Serialize(ms, obj) call in the ObjectToByteArray serialization function. Both functions are shown below. I’m assuming I can’t make any big gains on the hash algorithm itself, but I’m wondering if I can eke some performance out of the serialization function?
// Compute hash code
long hash(object s)
{
byte[] y = md5.ComputeHash(ObjectToByteArray(s)); // Produces byte[16]
long z = BitConverter.ToInt64(y, 0);
long res = z & bitMask;
return res;
}
// Convert an object to a byte array
private byte[] ObjectToByteArray(Object obj)
{
if (obj == null)
return null;
MemoryStream ms = new MemoryStream();
binForm.Serialize(ms, obj);
return ms.ToArray();
}
Use protobuf.net, found here, it’s far quicker!
Update
From looking at your code I assume there is no requriement that computed hashes be consistent across AppDomains? If not computing your HashCode can be as simple as:
For future reference, your MemoryStream should really be in a using block, or else you run the risk of leaking memory: