I am arguing with a colleague of mine, whether all Python classes really need to be hashable. We have this class that holds symbolic expressions (something similar to SymPy).
My argument is that since we cannot compare two expressions for equality, hashing should not be allowed. For example the expressions ‘(x)’ and ‘(1*x)’ might compare equal, whereas ‘sqrt(x*x*x)’ and ‘abs(x)*sqrt(x)’ might not. Therefore, ‘hash()’ should throw an error when called with a symbolic expression.
His argument is that you should be able to use all classes as keys in dictionaries and sets. Therefore, they must also be hashable. (I’m putting words in his mouth now, he would have explained it better.).
Who is right? Is it unpythonic or not to have classes that throw errors if you try to hash them?
The problem I see here is that you’re working with two different notions of equality. If I understand your comment correctly, you’ve overridden
__eq__to return an expression combining the two arguments to==. If said expression evaluates to True (in some sense), then the two expressions are equal; and if your expression class also implements__nonzero__(__bool__in Python 3) in such a way that__nonzero__returnsTrueiff the expression is true, then superficially it seems as though this should work fine.But in fact, it seems to me that the concept of equality you’ve defined is a very different concept of equality than the normal concept of equality at work in Python. A fundamental requirement for hashability is that, if two items evaluate as equal, then they should be completely interchangeable. And while two of your expression objects may evaluate as “equal,” I’m not certain they’re interchangeable! After all,
5 + 5and8 + 2evaluate to the same result, but they are not identical, are they? And given these two expressions, I suspect many people would expect them to hash to two separate bins in a dictionary!That behavior would be difficult, however, without giving
__eq__a more conventional definition. And as the docs say, “Hashable objects which compare equal must have the same hash value.” So if__eq__says that5 + 5and8 + 2are equal, then they must hash to the same value. That means that to make your expressions hashable as they are now, you’d have to chose a__hash__that is able to determine a canonical form for all expressions that evaluate as equal. That sounds awfully hard to me.In short, if these expressions are immutable, and if you redefine
__eq__to returnTrueiff the expressions are identical (a stronger requirement than “equal”), then there should be no problem making them hashable. On the other hand, I don’t see anything wrong with an unhashable immutable type; and I wouldn’t recommend trying to make your expressions hashable without redefining__eq__.So it all comes down to how badly you want to define
__eq__in an unconventional way. I guess on balance I would go with a conventional definition of__eq__, simply to avoid producing unexpected behavior. After all, special cases aren’t special enough to break the rules.