I need to split data evenly across n nodes in a distributed cache.
The following code will take a cache key and determine which Node to use:
public static int GetNodeIDByCacheKey(string key)
{
return Math.Abs(key.GetHashCode()) % TotalNumberOfNodes();
}
Unfortunatly the code isn’t reliable across different machine instances.
In testing it seems it will sometimes return a different Node for the same key.
Any thoughts or ideas on getting something to work better?
You should not rely on the implementation of
string‘sGetHashCode()other than the fact that strings of equal value will produce the same hash code – but what the particular value of the hash code will be is only required to be consistent as per the documentation for the current execution of an application – a different hash code can be returned if the application is run again.Also the implementation of
GetHashCodemight be different if you have different .NET CLR versions on the machines in question:Instead you could just define a consistent mapping from your string key to a numeric value which would allow you to bin your nodes consistently across restarts and machine boundaries, this i.e. could be achieved by converting the string into a byte array (i.e using
Encoding.UTF8.GetBytes()) and then converting the byte array to a number (either using a lossy conversion using just 64 bits or i.e usingBigInteger)