Can someone please explain to me the static HashMap#hash(int) method?
What’s the justification behind it to generate uniformly distributed hashes?
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
An example would make it easier to digest.
Clarification
I’m aware of the operators, truth tables and bitwise operations. I just can’t really decode the implementation nor the comment really. Or even the reasoning behind it.
>>>is the logical right shift (no sign-extension) (JLS 15.19 Shift Operators), and^is the bitwise exclusive-or (JLS 15.22.1 Integer Bitwise Operators).As to why this is done, the documentation offers a hint:
HashMapuses power-of-two length tables, and hashes keys by masking away the higher bits and taking only the lower bits of their hash code.So
hash()attempts to bring relevancy to the higher bits, which otherwise would get masked away (indexForbasically discards the higher bits ofhand takes only the lowerkbits wherelength == (1 << k)).Contrast this with the way
Hashtable(which should have NOT a power-of-two length table) uses a key’s hash code.By doing the more expensive
%operation (instead of simple bit masking), the performance ofHashtableis less sensitive to hash codes with poor distribution in the lower bits (especially iftable.lengthis a prime number).