I have 64-bit unsigned integers (ranging from 0 to 2^63 – 1) and I want to hash them into 32-bit unsigned integers (0 to 2^31 – 1 range).
Data follows Uniform Distribution. Can anyone suggest a hash function that will give a low number of collisions for this distribution (may be with some probability of collision occurrence)?
If the distribution of the is really uniform, then just take the lower
nbits (the width of the hash value). This would mean, that worst case you can have 2N-n elements in a bucket. (hereNdenotes the width of the original number)Note: just saw @JanDvorak already suggested this (before my answer), using modulo 2n is equivalent to taking the lower
nbits…If this is really about 64 bit unsigned integers being hashed into 32 bit unsigned integers, then the correct ranges would be [0;264-1] and [0;232-1], with at most 232 collisions on a single hash. However, in Java, there is no unsigned integer…
If this is about using the positive half of signed 64 and 32 bit integer values respectively, then your range values are right, and you will still have 232 collisions worst case.