This is my first question on stackflow. As you can tell, I am new to learning algorithms and data structure.
When using the division method for create a hash function (i.e. h(k) = k mod m), one is advised (e.g. by CLRS) to use a prime number not too close to a power of 2 for the divisor m. Could someone kindly explain to me why a choice of m to be a composite number is bad?
Consider the case if m is even and all the k values are all even. Then, the hash values will also all be even.
For example, consider the case m=6 hashing even values:
If you use these hash values as indices into a table, then half of the table will be unused. On the other hand, if m is a prime, you will get an even distribution of the hash values, even if the input values all have a common factor.
Consider the same input values, but with m=7:
Despite the fact that the input values are all even, the hash values are still uniformly distributed over [0..6].
So to summarize, if m is prime, then you’ll still get an even distribution of hash values even if all input values are divisible a common prime factor (other than m).