Please educate me on how to do this the right way, as I feel my current way is long-winded.
I know iterating over all entities in App Engine is not quite how it is designed to be used, but sometimes I want to gather statistics about my entities, for example how many users are female. In reality the criteria might be something more complicated, but in any case something that requires examining each entity.
Here is some pseudoish code on how I am iterating over entities:
def handle_count_female_users(cursor = None, counter = 0):
q = User.all()
if cursor:
q.with_cursor(cursor)
MAX_FETCH = 100
users = q.fetch(MAX_FETCH)
count_of_female_users = len(filter(lambda user:user.gender == 'female', users))
total_count = counter + count_of_female_users
if len(users) == MAX_FETCH:
Task(
url = "/count_female_users",
params = {
'counter' : str(total_count),
'cursor' : q.cursor()
}
).add()
else:
# Now finally have the result
logging.info("We have %s female users in total." % total_count)
I have routing code that automatically maps GET /foo to be handled by handle_foo, something that I’ve found convenient. As you can see, even with that I have a lot of stuff supporting the looping, having almost nothing to do with what I actually want to accomplish.
What I would really want to do is something like:
tally_entities(
entity_class = User,
filter_criteria = lambda user:user.gender == 'female',
result = lambda count:logging.info("We have %s female users in total" % count)
)
Any ideas how to get closer to this ideal, or is there some even better way?
Sounds like a good use case for mapreduce:
http://code.google.com/p/appengine-mapreduce/wiki/GettingStartedInPython