Python has a very nice ‘set’ data structure which is basically an unordered list that enables set operations.
I would have been tempted to use such a data structure for the following purpose:
I have a set of datapoints from a survey (each point is a two-element Scipy/numpy array) that can be divided in different subsets according to the gender and marital status of the respondents.
Unfortunately, Python sets don’t seem to allow so-called mutable objects such as numpy arays and lists.
I could use tuples for my datapoints, but I am wondering if there is a better way to do this.
Ideally, I would like to have several unordered lists (sets) of datapoints that I could intersect, union, etc.. – and which I could iterate over (both over the individual datapoints, and over the list of sets for plotting purposes).
So my question is: Is using sets of tuples the only way to do what I want in this context ? Is it really impossible in Python to have sets of mutable elements (such as numpy arrays) ?
python-sets must be hashable in python. So you may define a
class datapointand implement__hash__(self)and__eq__(self)as a function of its elements and add instances of those into your set.Or maybe you want to use a named tuple. I have not tested them but they implement
__hash__and__eq__also. They are still tuples but at least, they may be accessed in a more readable fashion.