I need to take a few RSS feeds, and archive all the items that get added to them. I’ve never consumed or created RSS before, but I know xml, so the format seems pretty intuitive.
I know how to parse the feed: How can I get started making a C# RSS Reader?
I know I can’t rely on the feed server to provide a complete history: Is it possible to get RSS archive
I know I’ll have to have some custom logic around duplicates: how to check uniqueness (non duplication) of a post in an rss feed
My question is, how can I ensure I don’t miss any items? My initial plan is to write a parser, where for each item in the feed:
1) Check to see if it’s already in the archive database
2) If not, add it to the database
If I schedule this to run once a day, can I be confident I won’t be missing any items?
It depends on the feed, some sites publish articles very frequently and may have their RSS feed configured to show only the 10 most recent articles. Some sites are going to do the opposite.
Ideally your app should ‘learn’ the frequency this from the sites and tune itself to ping those sites based on the learnt frequency. (Ex: If you see new unique articles every time you ping, you’ll need to ping more often, on the other hand if you see the same set of articles on multiple attempts, you may back off the next time).