I am quite often using funky stuff as keys for dictionaries, and therefore, I am wondering what is the right way to do it – and this goes through implementing good hash methods for my objects. I am aware of other questions asked here like good way to implement hash, but I’d like to understand how the default __hash__ works for custom objects, and if it is possible to rely on it.
I have noticed that mutables are explicitely unhashable since hash({}) raises an error … but strangely, custom classes are hashable :
>>> class Object(object): pass
>>> o = Object()
>>> hash(o)
So, does anybody knows how this default hash function works ? By understanding this, I’d like to know :
Can I rely on this default hash, if I put objects of a same type as keys of a dictionary ? e.g. :
key1 = MyObject()
key2 = MyObject()
key3 = MyObject()
{key1: 1, key2: 'blabla', key3: 456}
Can I rely on it if I use objects of different types as keys in a dictionary ? e.g.
{int: 123, MyObject(10): 'bla', 'plo': 890}
And in the last case also, how to make sure that my custom hashes don’t clash with the builtin hashes ? e.g :
{int: 123, MyObject(10): 'bla', MyObjectWithCustomHash(123): 890}
What you can rely on: custom objects have a default
hash()that is based in some way on the identity of the object. i.e. any object using the default hash will have a constant value for that hash over its lifetime and different objects may or may not have a different hash value.You cannot rely on any particular relationship between the value returned by
id()and the value returned byhash(). In the standard C implementation of Python 2.6 and earlier they were the same, in Python 2.7-3.2hash(x)==id(x)/16.Edit: originally I wrote that in releases 3.2.3 and later or 2.7.3 or later the hash value may be randomised and in Python 3.3 the relationship will always be randomised. In fact that randomisation at present only applies to hashing strings so in fact the divide by 16 relationship may continue to hold for now, but don’t bank on it.
Hash collisions don’t usually matter: in a dictionary lookup to find an object it must have the same hash and must also compare equal. Collisions only matter if you get a very high proportion of collisions such as in the denial of service attack that led to recent versions of Python being able to randomise the hash calculation.