Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8804201
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T01:32:30+00:00 2026-06-14T01:32:30+00:00

I’m trying to write a python program that will grab and display any rss

  • 0

I’m trying to write a python program that will grab and display any rss updates since the last time the program was run. I am using feedparser and trying to use etags and last modified as described here on SO but my test script seems to not be working.

import feedparser
rsslist=["http://skottieyoung.tumblr.com/rss","http://mrjakeparker.com/feed/"]
for feed in rsslist:
print('--------'+feed+'-------')
d=feedparser.parse(feed)
print(len(d.entries))
if (len(d.entries) > 0):
    etag=d.feed.get('etag','')
    modified=d.get('modified',d.get('updated',d.entries[0].get('published','no modified,update or published fields present in rss')))

    d2=feedparser.parse(feed,modified)
    if (len(d2.entries) > 0):
        etag2=d2.feed.get('etag','')
        modified2=d2.get('updated',d.entries[0].get('published',''))

    if (d2==d): #ideally we would never see this bc etags/last modified would prevent unnecessarily downloading what we all ready have.
        print("Arrg these are the same")

I’m honestly not sure if rss/xml technology has changed from the references I’ve been using online or if there is a problem with my code.

Regardless I’m looking for a best solution to efficiently use rss feeds. As it stands I’m looking to minimize bandwidth waste such as that which is intended by use of last-modified and the etags fields.

Thanks in advance.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T01:32:31+00:00Added an answer on June 14, 2026 at 1:32 am

    Your issue is that you are passing in the last modified date in place of the etag. The etag is the second argument to the parse() method, modified is the third argument.

    Instead of:

    d2=feedparser.parse(feed,modified)
    

    Do:

    d2=feedparser.parse(feed,modified=modified)
    

    After taking a look at the source code, it looks like the only thing passing etag or modified to the parse() function does is send the appropriate headers to the server so that the server can return an empty response if nothing has changed. If the server does not support this then the server will just return the full RSS feed. I would modify your code to check the dates of each entry and ignore one with a date that is smaller than the max date in the previous request:

    import feedparser
    rsslist=["http://skottieyoung.tumblr.com/rss", "http://mrjakeparker.com/feed/"]
    
    def feed_modified_date(feed):
        # this is the last-modified value in the response header
        # do not confuse this with the time that is in each feed as the server
        # may be using a different timezone for last-resposne headers than it 
        # uses for the publish date
    
        modified = feed.get('modified')
        if modified is not None:
            return modified
    
        return None
    
    def max_entry_date(feed):
        entry_pub_dates = (e.get('published_parsed') for e in feed.entries)
        entry_pub_dates = tuple(e for e in entry_pub_dates if e is not None)
    
        if len(entry_pub_dates) > 0:
            return max(entry_pub_dates)    
    
        return None
    
    def entries_with_dates_after(feed, date):
        response = []
    
        for entry in feed.entries:
            if entry.get('published_parsed') > date:
                response.append(entry)
    
        return response            
    
    for feed_url in rsslist:
        print('--------%s-------' % feed_url)
        d = feedparser.parse(feed_url)
        print('feed length %i' % len(d.entries))
    
        if len(d.entries) > 0:
            etag = d.feed.get('etag', None)
            modified = feed_modified_date(d)
            print('modified at %s' % modified)
    
            d2 = feedparser.parse(feed_url, etag=etag, modified=modified)
            print('second feed length %i' % len(d2.entries))
            if len(d2.entries) > 0:
                print("server does not support etags or there are new entries")
                # perhaps the server does not support etags or last-modified
                # filter entries ourself
    
                prev_max_date = max_entry_date(d)
    
                entries = entries_with_dates_after(d2, prev_max_date)
    
                print('%i new entries' % len(entries))
            else:
                print('there are no entries')
    

    This produces:

    --------http://skottieyoung.tumblr.com/rss-------
    feed length 20
    modified at None
    second feed length 20
    server does not support etags or there are new entries
    0 new entries
    --------http://mrjakeparker.com/feed/-------
    feed length 10
    modified at Wed, 07 Nov 2012 19:27:48 GMT
    second feed length 0
    there are no entries
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I am trying to understand how to use SyndicationItem to display feed which is
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I need a function that will clean a strings' special characters. I do NOT
I'm trying to create an if statement in PHP that prevents a single post
Basically, what I'm trying to create is a page of div tags, each has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I am trying to find ID3V2 tags from MP3 file using jid3lib in Java.
this is what i have right now Drawing an RSS feed into the php,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.