I’ve got a simple list of objects. I would like to compute a kind of hash value for each object, used to sort the list.
My question is : Who is responsible for computing the hash ?
1/ The list
Because the hash method is specific to the list, and applied to each object.
Objects are just objects, they don’t know about sorting and hashing.
2/ Each object
Because the object is the best candidate as he have all the data to do it.
And this may be computed on internal data not accessible to others.
3/ Another challenger ? Like a controller between the list and the objects ?
The class that defines the concept of equivalence used.
If an object defines a general purpose concept of equality, then it should define a hashcode that corresponds with that, as part of that job.
After all, it is that class that “knows” how the
Equals(),isEqual(),areEqual,==or whatever is defined. It is necessary that whena == bthathash(a) == hash(b), so it is the only class that can do so.However, if another class defines a concept of equality, (perhaps to use a different one for different ways, in which the different ways one might consider strings equal or not is a classic example) then that class must define the hashcode for similar reasons.
.NET expresses this linking of responsibility, as an example. In .NET all objects have a
Equals(). Whether that’s a good thing or not is debatable (some would prefer the C++ approach where an object need not have any sense of being equal to another), but once done it does make sense that all objects also have aGetHashCode(), because of the link between one and the other. .NET also though hasIEqualityComparer<T>andIEqualityComparerwhich defines a means for a class to have responsibility for a particular non-built-in sense of equality. Here again, to take responsibility for one requires taking responsibility for the other.Now. Which is better?
Well, if there is an overwhelmingly obvious sense of what “equals” means in a given case, it should probably be handled by the class: Two representations of the same co-ordinate or the same complex number or which refer to the same real-world object, should probably be considered equal most of the time. So that gives a default use.
If there’s an overwhelmingly obvious sense of what equals should mean in the context of a given container type, then that should be applied there.
Otherwise there should be a connector that defines it. We can hence separate the concerns nicely.
However we can tie the three together quite neatly. We define a default connector. It’s implementation merely passes the call to get a hashcode or test for equality to that defined on the object.
We define any general-purpose hash-tables, hash-sets etc. to always make use of connectors, with defaults upon construction or default template parameters (if the language has the sort of generic/templates approach that allows this, which C++ for example does and C# for example does not), so that by default we are using this default connector.
When defining special-purpose collection types which depend upon a particular view of concept for their very purpose, we build it from one of those collections, overriding the connector.
The flip-side of this rule, is that if you don’t have a defined means for all equatable objects to give a hashcode (e.g you have an
==override mechanism, but not deep support for aGetHashCode()), then you have to use the connector approach. Note that while C++ for example does have==, it doesn’t have that sort of support for knowing how to has a given object. Hence the STL having to have ahash_mapand there being only very limited support for out-of-the-box defaults for it.