I have an application that polls several RSS sources on the web.
What is the etiquette when polling other’s web servers. How frequently to poll, etc?
What are the best practices?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Make use of HTTP cache. Send
EtagandLastModifiedheaders. Recognize304 Not modifiedresponse. This way you can save a lot of bandwidth. Additionally some scripts recognize theLastModifiedheader and return only partial contents (ie. only the two or three newest items instead of all 30 or so).Don’t poll RSS from services that supports RPC Ping (or other PUSH service, such as PubSubHubbub). I.e. if you’re receiving PUSH notifications from a service, you don’t have to poll the data in the standard interval — do it once a day to check if the mechanism still works or not (ping can be disabled, reconfigured, damaged, etc). This way you can fetch RSS only on receiving notification, not every hour or so.
Check the TTL (in RSS) or cache control headers (
Expiresin ATOM), and don’t fetch until resource expires.Try to adapt to frequency of new items in each single RSS feed. If in the past week there were only two updates in particular feed, don’t fetch it more than once a day. AFAIR Google Reader does that.
Lower the rate at night hours or other time when the traffic on your site is low.
At last, do it once a hour. 😉