I have a list of objects (Foo). A Foo object has several attributes. An instance of a Foo object is equivalent (equal) to another instance of a Foo object iff (if and only if) all the attributes are equal.
I have the following code:
class Foo(object):
def __init__(self, myid):
self.myid=myid
def __eq__(self, other):
if isinstance(other, self.__class__):
print 'DEBUG: self:',self.__dict__
print 'DEBUG: other:',other.__dict__
return self.__dict__ == other.__dict__
else:
print 'DEBUG: ATTEMPT TO COMPARE DIFFERENT CLASSES:',self.__class__,'compared to:', other.__class__
return False
import copy
f1 = Foo(1)
f2 = Foo(2)
f3 = Foo(3)
f4 = Foo(4)
f5 = copy.deepcopy(f3) # overkill here (I know), but needed for my real code
f_list = [f1,f2,f3,f4,f5]
# Surely, there must be a better way? (this dosen't work BTW!)
new_foo_list = list(set(f_list))
I often used this little (anti?) ‘pattern’ above (converting to set and back), when dealing with simple types (int, float, string – and surprisingly datetime.datetime types), but it has come a cropper with the more involved data type – like Foo above.
So, how could I change the list f1 above into a list of unique items – without having to loop through each item and doing a check on whether it already exists in some temporary cache etc etc?.
What is the most pythonic way to do this?
First, I want to emphasize that using
setis certainly not an anti-pattern.sets eliminate duplicates in O(n) time, which is the best you can do, and way better than the naive O(n^2) solution of comparing every item to every other item. It’s even better than sorting — and indeed, it seems your data structure might not even have a natural order, in which case sorting doesn’t make a lot of sense.The problem with using a set in this case is that you have to define a custom
__hash__method. Others have said this. But whether or not you can do so easily is an open question — it depends on details about your actual class that you haven’t told us. For example, if any attributes of aFooobject above are not hashable, then creating a custom hash function is going to be difficult, because you’ll have to not only write a custom hash forFooobjects, you’ll also have to write custom hashes for every other type of object!So you need to tell us more about what kinds of attributes your class has if you want a conclusive answer. But I can offer some speculation.
Assuming that a hash function could be written for
Fooobjects, but also assuming that thatFooobjects are mutable and so really shouldn’t have a__hash__method, as Niklas B. points out, here is one workable approach. Create a functionfreezethat, given a mutable instance ofFoo, returns an immutable collection of the data inFoo. So for example, say Foo has adictand alistin it;freezereturns atuplecontaining atupleoftuples (representing thedict) and anothertuple(representing thelist). The functionfreezeshould have the following property:If and only if
Now pass your list through the following code:
Now you have a dupe free list in O(n) time. (Indeed, after adding this suggestion, I saw that fraxel suggested something similar; but I think using a custom function — or even a method —
(x.freeze(), x)— is the better way to go, rather than relying on__dict__as he does, which can be unreliable. The same goes for your custom__eq__method, IMO —__dict__is not always a safe shortcut for various reasons I can’t get into here.)Another approach would be to use only immutable objects in the first place! For example, you could use
namedtuples. Here’s an example stolen from the python docs: