Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9098757
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T00:26:20+00:00 2026-06-17T00:26:20+00:00

I have been having a persistent problem getting an rss feed from a particular

  • 0

I have been having a persistent problem getting an rss feed from a particular website. I wound up writing a rather ugly procedure to perform this function, but I am curious why this happens and whether any higher level interfaces handle this problem properly. This problem isn’t really a show stopper, since I don’t need to retrieve the feed very often.

I have read a solution that traps the exception and returns the partial content, yet since the incomplete reads differ in the amount of bytes that are actually retrieved, I have no certainty that such solution will actually work.

#!/usr/bin/env python
import os
import sys
import feedparser
from mechanize import Browser
import requests
import urllib2
from httplib import IncompleteRead

url = 'http://hattiesburg.legistar.com/Feed.ashx?M=Calendar&ID=543375&GUID=83d4a09c-6b40-4300-a04b-f88884048d49&Mode=2013&Title=City+of+Hattiesburg%2c+MS+-+Calendar+(2013)'

content = feedparser.parse(url)
if 'bozo_exception' in content:
    print content['bozo_exception']
else:
    print "Success!!"
    sys.exit(0)

print "If you see this, please tell me what happened."

# try using mechanize
b = Browser()
r = b.open(url)
try:
    r.read()
except IncompleteRead, e:
    print "IncompleteRead using mechanize", e

# try using urllib2
r = urllib2.urlopen(url)
try:
    r.read()
except IncompleteRead, e:
    print "IncompleteRead using urllib2", e


# try using requests
try:
    r = requests.request('GET', url)
except IncompleteRead, e:
    print "IncompleteRead using requests", e

# this function is old and I categorized it as ...
# "at least it works darnnit!", but I would really like to 
# learn what's happening.  Please help me put this function into
# eternal rest.
def get_rss_feed(url):
    response = urllib2.urlopen(url)
    read_it = True
    content = ''
    while read_it:
        try:
            content += response.read(1)
        except IncompleteRead:
            read_it = False
    return content, response.info()


content, info = get_rss_feed(url)

feed = feedparser.parse(content)

As already stated, this isn’t a mission critical problem, yet a curiosity, as even though I can expect urllib2 to have this problem, I am surprised that this error is encountered in mechanize and requests as well. The feedparser module doesn’t even throw an error, so checking for errors depends on the presence of a ‘bozo_exception’ key.

Edit: I just wanted to mention that both wget and curl perform the function flawlessly, retrieving the full payload correctly every time. I have yet to find a pure python method to work, excepting my ugly hack, and I am very curious to know what is happening on the backend of httplib. On a lark, I decided to also try this with twill the other day and got the same httplib error.

P.S. There is one thing that also strikes me as very odd. The IncompleteRead happens consistently at one of two breakpoints in the payload. It seems that feedparser and requests fail after reading 926 bytes, yet mechanize and urllib2 fail after reading 1854 bytes. This behavior is consistend, and I am left without explanation or understanding.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T00:26:21+00:00Added an answer on June 17, 2026 at 12:26 am

    At the end of the day, all of the other modules (feedparser, mechanize, and urllib2) call httplib which is where the exception is being thrown.

    Now, first things first, I also downloaded this with wget and the resulting file was 1854 bytes. Next, I tried with urllib2:

    >>> import urllib2
    >>> url = 'http://hattiesburg.legistar.com/Feed.ashx?M=Calendar&ID=543375&GUID=83d4a09c-6b40-4300-a04b-f88884048d49&Mode=2013&Title=City+of+Hattiesburg%2c+MS+-+Calendar+(2013)'
    >>> f = urllib2.urlopen(url)
    >>> f.headers.headers
    ['Cache-Control: private\r\n',
     'Content-Type: text/xml; charset=utf-8\r\n',
     'Server: Microsoft-IIS/7.5\r\n',
     'X-AspNet-Version: 4.0.30319\r\n',
     'X-Powered-By: ASP.NET\r\n',
     'Date: Mon, 07 Jan 2013 23:21:51 GMT\r\n',
     'Via: 1.1 BC1-ACLD\r\n',
     'Transfer-Encoding: chunked\r\n',
     'Connection: close\r\n']
    >>> f.read()
    < Full traceback cut >
    IncompleteRead: IncompleteRead(1854 bytes read)
    

    So it is reading all 1854 bytes but then thinks there is more to come. If we explicitly tell it to read only 1854 bytes it works:

    >>> f = urllib2.urlopen(url)
    >>> f.read(1854)
    '\xef\xbb\xbf<?xml version="1.0" encoding="utf-8"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">...snip...</rss>'
    

    Obviously, this is only useful if we always know the exact length ahead of time. We can use the fact the partial read is returned as an attribute on the exception to capture the entire contents:

    >>> try:
    ...     contents = f.read()
    ... except httplib.IncompleteRead as e:
    ...     contents = e.partial
    ...
    >>> print contents
    '\xef\xbb\xbf<?xml version="1.0" encoding="utf-8"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">...snip...</rss>'
    

    This blog post suggests this is a fault of the server, and describes how to monkey-patch the httplib.HTTPResponse.read() method with the try..except block above to handle things behind the scenes:

    import httplib
    
    def patch_http_response_read(func):
        def inner(*args):
            try:
                return func(*args)
            except httplib.IncompleteRead, e:
                return e.partial
    
        return inner
    
    httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read)
    

    I applied the patch and then feedparser worked:

    >>> import feedparser
    >>> url = 'http://hattiesburg.legistar.com/Feed.ashx?M=Calendar&ID=543375&GUID=83d4a09c-6b40-4300-a04b-f88884048d49&Mode=2013&Title=City+of+Hattiesburg%2c+MS+-+Calendar+(2013)'
    >>> feedparser.parse(url)
    {'bozo': 0,
     'encoding': 'utf-8',
     'entries': ...
     'status': 200,
     'version': 'rss20'}
    

    This isn’t the nicest way of doing things, but it seems to work. I’m not expert enough in the HTTP protocols to say for sure whether the server is doing things wrong, or whether httplib is mis-handling an edge case.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been having trouble pulling up a custom UIPickerView from the textfield's inputview
I have been having this annoying problem when trying to implement a picture gallery
I have been having the following problem, i think it's probably due to the
We have been having serious trouble getting an application we devlop running with UAC
I have been having this problem in IE. I have two divs which I
I have been having troubles getting my makefiles to work the way I want.
I have been having problem using phpmailer to send bulk emails, no problem with
I have been having some inexplicable SFINAE problems in a program I'm writing, so
I have been having this problem for a couple of weeks now. The problem
I have been having these really odd problems with Visual Studio 2010. At this

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.