I have a cron job everyday to make a call to an API and fetch some data. For each row of the data I kick off a task queue to process the data (which involves looking up data via further APIs). Once all this has finished my data doesn’t change for the next 24 hours so I memcache it.
Is there a way of knowing when all the tasks I queued up have finished so that I can cache the data?
Currently I do it in a really messy fashion by just scheduling two cron jobs like this:
class fetchdata(webapp.RequestHandler):
def get(self):
todaykey = str(date.today())
memcache.delete(todaykey)
topsyurl = 'http://otter.topsy.com/search.json?q=site:open.spotify.com/album&window=d&perpage=20'
f = urllib.urlopen(topsyurl)
response = f.read()
f.close()
d = simplejson.loads(response)
albums = d['response']['list']
for album in albums:
taskqueue.add(url='/spotifyapi/', params={'url':album['url'], 'score':album['score']})
class flushcache(webapp.RequestHandler):
def get(self):
todaykey = str(date.today())
memcache.delete(todaykey)
Then my cron.yaml looks like this:
- description: gettopsy
url: /fetchdata/
schedule: every day 01:00
timezone: Europe/London
- description: flushcache
url: /flushcache/
schedule: every day 01:05
timezone: Europe/London
Basically – I’m making a guess that all my tasks won’t take more than 5 minutes to run so I just flush the cache 5 mins later and this ensures that when the data is cached it’s complete.
Is there a better way of coding this? Feels like my solution isn’t the best one….
Thanks
Tom
There’s not currently any way to determine when your tasks have finished executing. Your best option would be to insert marker records in the datastore, and have each task delete its record when it’s done. Each task can then check if it’s the last task, and perform your cleanup / caching if it is.