I have an API for analysing my exercise data (which I scrape runkeeper‘s website).
My main class is a subclass of a pandas.DataFrame, which is basically a container for tabular data. It supports indexing by column name, returning an array of the column values.
I would like to add some convenience properties based on the types of ‘fitness activities’ that are present in the data. So for example I’d like to add a property ‘running’:
@property
def running(self):
return self[self['type'] == 'running']
Which would return all rows of the DataFrame which have ‘running’ in the ‘type’ column.
I tried to do this dynamically for all types present in the data. Here’s what I naively did:
class Activities(pandas.DataFrame):
def __init__(self,data):
pandas.DataFrame.__init__(self,data)
# The set of unique types in the 'type' column:
types = set(self['type'])
for type in types:
method = property(lambda self: self[self['type'] == type])
setattr(self.__class__,type,method)
The result was that all of these properties ended up returning tables of data for the same type of activity (‘walking’).
What’s happening is that when the properties are accessed, the lambdas are called and they look in the scope they were defined in for the name ‘type’. They find that it is bound to the string ‘walking’, since that was the last iteration of the for loop. Each iteration of the for loop doesn’t have its own namespace, so all the lambdas see only the last iteration, rather than the value that ‘type’ had when they were actually defined.
Can anyone thing of a way around this? I can think of two, but they don’t seem particularly ideal:
-
define
__getattr__to check that the attribute is an activity type and return the appropriate rows. -
use a recursive function call instead of a for loop, so that each level of recursion has its own namespace.
Both of these are a little too clever for my tastes, and pandas.DataFrame already has a __getattr__ that I’d have to cautiously interact with if I made one too. And recursion would work, but feels very wrong since the set of types doesn’t have any intrinsic tree-like structure to it. It’s flat, and should look flat in the code!
Modify the
lambdato pull the values into the new scope.