I’ve got a system that fetches a few hundred RSS feeds. Currently they’re on a 10 minute refresh cycle, but I’d preferably like to make that faster. What is a strategy to fetch the RSS sources at near-realtime/push intervals?
Some solutions I’ve come across:
- do a fetch at 1 minute; if no changes, fetch again at 2, then 4, then 8, etc.
- find the average time-between-updates interval/variance of the RSS feed, and put them in a bucket (this one updates every 3 mins, so do a check every 1 minute; this one updates every week, so do a check every day, etc.)
I’ve used something like you first option. Start with a default time before retrieving a feed. If new items are found reduce the waiting period with 10%, otherwise increase with 10%. Perform this adaption with every update and the system adjusts itself.
You could use different percentages, e.g. decrease the time quicker to respond better to change in update frequency.
Include a minimum and maximum timespan to keep waiting within a predefined range.
It’s not perfect but was good enough for me.