I’m trying to create a custom object that behaves properly in set operations.
I’ve generally got it working, but I want to make sure I fully understand the implications. In particular, I’m interested in the behavior when there is additional data in the object that is not included in the equal / hash methods. It seems that in the ‘intersection’ operation, it returns the set of objects that are being compared to, where the ‘union’ operations returns the set of objects that are being compared.
To illustrate:
class MyObject:
def __init__(self,value,meta):
self.value = value
self.meta = meta
def __eq__(self,other):
return self.value == other.value
def __hash__(self):
return hash(self.value)
a = MyObject('1','left')
b = MyObject('1','right')
c = MyObject('2','left')
d = MyObject('2','right')
e = MyObject('3','left')
print a == b # True
print a == c # False
for i in set([a,c,e]).intersection(set([b,d])):
print "%s %s" % (i.value,i.meta)
#returns:
#1 right
#2 right
for i in set([a,c,e]).union(set([b,d])):
print "%s %s" % (i.value,i.meta)
#returns:
#1 left
#3 left
#2 left
Is this behavior documented somewhere and deterministic? If so, what is the governing principle?
Nope, it’s not deterministic. The problem is that you’ve broken equals’ and hash’s invariant, that two objects are equivalent when they are equal. Fix your object, don’t try to be clever and abuse how set’s implementation works. If the meta value is part of MyObject’s identity, it should be included in eq and hash.
You can’t rely on set’s intersection to follow any order, so there is no way to easily do what you want. What you would end up doing is taking the intersection by value only, then look through all your objects for an older one to replace it with, for each one. No nice way to do it algorithmically.
Unions are not so bad: