I’m ready to rock and roll with my little mapreduce routine (Python, Google Appengine), but I’m nervous about having some bug that will infect my database table. My processing function looks like this:
def promote(nrhp_aux_entity):
...
# I form a query, and use it to get an "nrhp_record". That's the item
# I'm actually changing
results1 = query1.fetch(limit=1)
nrhp_record = results1[0]
...
yield op.db.Put(np_record)
I’d like to have it run on just a small number of nrhp_aux_entity objects, and then just exit. Then I can look over the results and decide to let it work on the whole table. So would a good plan be to have a global counter of some sort, and then exit the whole mapreduce, when say, the counter gets to some small number, like 5? And if so, what’s a good way to implement the global counter?
And if I do this, I expect my mapreduce will be all finished in just a minute or so, right, since it only is operating on 5 entities in my database table, (which contains about 76,000 entities)?
I would copy a few entities to a new kind and set the mapper going on that new kind. Implementing a counter that works at high contention is harder than just making a separate test environment, and the test environment has the benefit of not working on any real data.
Google also just released useful backup / restore features you might want to use before setting your mapper loose on all of your production data!
Depending on your queue settings and how long a single mapping task takes, I would expect a mapreduce over 5 entities to take very little time. Like… 200ms.