I had the following code to generate a hash of an object:
public int GetHashCode(MyType obj)
{
return (obj.Prop1.GetHashCode() + obj.Prop2.GetHashCode() + obj.Prop3.GetHashCode()).GetHashCode();
}
I.e. I add all the properties’ hash codes and then take the hash of this.
In review, a coworker suggested that this will collide too frequently. I’m not sure that this is true because:
- Given that hash codes are chosen with equal frequency among positive and negative numbers and they wrap around, I don’t think there’s any additional information we gain about the likelihood of these numbers’ sum as opposed to the numbers themselves
- To the extent that their sum is non-random, hash codes are designed to make numbers that are “close together” become “far apart”, so feeding a non-uniformly-distributed value into the function shouldn’t be an issue
Who is correct?
It is in C#, in case the answer is language-specific.
Yes.
Just suppose Prop1, Prop2 etc are of type
int. Usually only the lower range of integers is used. Your sum approach will collide more often than necessary.The HasCode of
7is 7, which makes perfect sense when hashingintby it self. But with your code the tuples<7, 3>,<3, 7>and<8, 2>will all have the same Hash. The same with simple XOR instead of Addition.The common approach is to add some (prime) numbers and shifting:
The numbers 19, 31, 37 are not too critical. And if you prefer you can use OR or XOR instead of
+.