Is there a way to test the quality of a hash function? I want to have a good spread when used in the hash table, and it would be great if this is verifyable in a unit test.
EDIT: For clarification, my problem was that I have used long values in Java in such a way that the first 32 bit encoded an ID and the second 32 bit encoded another ID. Unfortunately Java’s hash of long values just XORs the first 32 bit with the second 32 bits, which in my case led to very poor performance when used in a HashMap. So I need a different hash, and would like to have a Unit Test so that this problem cannot creep in any more.
You have to test your hash function using data drawn from the same (or similar) distribution that you expect it to work on. When looking at hash functions on 64-bit longs, the default Java hash function is excellent if the input values are drawn uniformly from all possible long values.
However, you’ve mentioned that your application uses the long to store essentially two independent 32-bit values. Try to generate a sample of values similar to the ones you expect to actually use, and then test with that.
For the test itself, take your sample input values, hash each one and put the results into a set. Count the size of the resulting set and compare it to the size of the input set, and this will tell you the number of collisions your hash function is generating.
For your particular application, instead of simply XORing them together, try combining the 32-bit values in ways a typical good hash function would combine two indepenet ints. I.e. multiply by a prime, and add.