Is there a tool/framework available to support periodically polling various resources? e.g. I have in mind an RSS aggregator which would check for new content once a day, or a tool to maintain a cache of users’ Twitter’s avatar, which would poll their Twitter account once a week.
I’m not looking for the tools to perform the actual fetching or feed-processing; I’m looking for something which would store date of last fetch, wake up when the next one is due, etc.
Messaging tools like Resque and Delayed Job are optimised for “time-shifting” specific incoming requests rather than handling periodic tasks. In other words, I don’t think you’d want to keep a perpetual job around for every user to retrieve their Twitter avatar. But I stand to be corrected :D.
Anacron is great for this. We have it wake up once a day to trigger background fetching. The background fetcher does a query to SQL to find the next N users who need updates, then does that batch.
http://en.wikipedia.org/wiki/Anacron
“It performs periodic command scheduling which is traditionally done by cron, but without assuming that the system is running continuously. Thus, it can be used to control the execution of daily, weekly, and monthly jobs on systems that don’t run 24 hours a day.
Anacron makes sure that these commands are run at the specified intervals as closely as machine uptime permits.”