You could try some javascript: <script type="text/javascript"> <!-- if (top.location!=…

Question

0

Asked: May 15, 20262026-05-15T00:51:28+00:00 2026-05-15T00:51:28+00:00

It’s failing with this when I run eclipse or when I run my script

0

It’s failing with this when I run eclipse or when I run my script in iPython:

'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128)

I don’t know why, but when I simply execute the feedparse.parse(url) statement using the same url, there is no error thrown. This is stumping me big time.

The code is as simple as:

      try:
           d = feedparser.parse(url)
      except Exception, e:
           logging.error('Error while retrieving feed.')
           logging.error(e)
           logging.error(formatExceptionInfo(None))
           logging.error(formatExceptionInfo1())

Here is the stack trace:

d = feedparser.parse(url)


 File "C:\Python26\lib\site-packages\feedparser.py", line 2623, in parse
    feedparser.feed(data)
  File "C:\Python26\lib\site-packages\feedparser.py", line 1441, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "C:\Python26\lib\sgmllib.py", line 104, in feed
    self.goahead(0)
  File "C:\Python26\lib\sgmllib.py", line 143, in goahead
    k = self.parse_endtag(i)
  File "C:\Python26\lib\sgmllib.py", line 320, in parse_endtag
    self.finish_endtag(tag)
  File "C:\Python26\lib\sgmllib.py", line 360, in finish_endtag
    self.unknown_endtag(tag)
  File "C:\Python26\lib\site-packages\feedparser.py", line 476, in unknown_endtag
    method()
  File "C:\Python26\lib\site-packages\feedparser.py", line 1318, in _end_content
    value = self.popContent('content')
  File "C:\Python26\lib\site-packages\feedparser.py", line 700, in popContent
    value = self.pop(tag)
  File "C:\Python26\lib\site-packages\feedparser.py", line 641, in pop
    output = _resolveRelativeURIs(output, self.baseuri, self.encoding)
  File "C:\Python26\lib\site-packages\feedparser.py", line 1594, in _resolveRelativeURIs
    p.feed(htmlSource)
  File "C:\Python26\lib\site-packages\feedparser.py", line 1441, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "C:\Python26\lib\sgmllib.py", line 104, in feed
    self.goahead(0)
  File "C:\Python26\lib\sgmllib.py", line 138, in goahead
    k = self.parse_starttag(i)
  File "C:\Python26\lib\sgmllib.py", line 296, in parse_starttag
    self.finish_starttag(tag, attrs)
  File "C:\Python26\lib\sgmllib.py", line 338, in finish_starttag
    self.unknown_starttag(tag, attrs)
  File "C:\Python26\lib\site-packages\feedparser.py", line 1588, in unknown_starttag
    attrs = [(key, ((tag, key) in self.relative_uris) and self.resolveURI(value) or value) for key, value in attrs]
  File "C:\Python26\lib\site-packages\feedparser.py", line 1584, in resolveURI
    return _urljoin(self.baseuri, uri)
  File "C:\Python26\lib\site-packages\feedparser.py", line 286, in _urljoin
    return urlparse.urljoin(base, uri)
  File "C:\Python26\lib\urlparse.py", line 215, in urljoin
    params, query, fragment))
  File "C:\Python26\lib\urlparse.py", line 184, in urlunparse
    return urlunsplit((scheme, netloc, url, query, fragment))
  File "C:\Python26\lib\urlparse.py", line 192, in urlunsplit
    url = scheme + ':' + url
  File "C:\Python26\lib\encodings\cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)

PARTIALLY SOLVED:

This is reproducable when the URL being passed to feedparser.parse() is unicode. It won’t repro when it’s an ascii URL. And for the record, you need a feed that has some high character unicode characters. I am not sure why this is.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T00:51:29+00:00

Looks like the url that is giving you problem contains text with some encoding (such as latin-1, where 0xe2 would be “lowercase a with a circle on top” aka â) without a proper content-type header (it should have a charset= parameter in Content-Type: but doesn’t).

If that is the case feedparser cannot guess the encoding, tries the default (ascii), and fails.

this part of feedparser’s docs explains the issues in more detail.

Unfortunately there are no “magic bullets” to solve this general issue (due to bozos that break the XML rules). You could try catching this exception, and in the handler read the url’s contents separately (use urllib2) and try decoding them with various possible encodings — then when you finally get a usable unicode object this way, feed that to feedparser.parse (whose first arg can be a url, a file stream, or a unicode string with the data).

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions