I had a weird idea about a hashing function. The problem statement is
You are storing id-numbers of 162
students in a class obtaining n marks
out of 300 in a course (for each n=0,
1, 2, … 300) in a hash table. Devise
the simplest and least collision prone
hash function for this such that the
wasted memory cells also are minimum.
Here, a collision is when two students
scoring n1 and
n2 get the same slot in the
hash table.
One solution can be to use h(n) = (n*5 + 7) % 163 along with chaining. There can be at most 162 distinct marks.
EDIT There can be several standard ways to do this. But I’d like to try my idea and check it (maybe mathematically). It just might have lesser collisions with lesser memory.
Now, here’s the idea I had. I can assume distribution of marks to be gaussian. So, there are more people near the average score and lesser at the extremes.
So, I can have a hash function something like this:
h(n) = 0 (if n<100 || n>200)
h(n) = 1 (if 100<=n<125 || 175<=n<200)
h(n) = 2 (if 125<=n<140 || 160<=n<175)
h(n) = 3 (if 140<=n<160)
For some such conditions (say, k), the hash table will have the least number of collisions and the least amount of space occupied.
Now, this is just a guess.Does something like this make sense?Is there a way to prove this? Or am I wrong somewhere?
What you are doing manually here is called in image processing histogram equalization. I think it makes absolutely sense, because you make sure that statistically all slots are used with the same probability, and so you’re minimizing collisions.