This was most probably used in mapping Entity-like objects in…

Question

0

Asked: May 12, 20262026-05-12T00:07:32+00:00 2026-05-12T00:07:32+00:00

I thought BeautifulSoup will be able to handle malformed documents, but when I sent

0

I thought BeautifulSoup will be able to handle malformed documents, but when I sent it the source of a page, the following traceback got printed:


Traceback (most recent call last):
  File "mx.py", line 7, in 
    s = BeautifulSoup(content)
  File "build\bdist.win32\egg\BeautifulSoup.py", line 1499, in __init__
  File "build\bdist.win32\egg\BeautifulSoup.py", line 1230, in __init__
  File "build\bdist.win32\egg\BeautifulSoup.py", line 1263, in _feed
  File "C:\Python26\lib\HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "C:\Python26\lib\HTMLParser.py", line 150, in goahead
    k = self.parse_endtag(i)
  File "C:\Python26\lib\HTMLParser.py", line 314, in parse_endtag
    self.error("bad end tag: %r" % (rawdata[i:j],))
  File "C:\Python26\lib\HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: u"", at line 258, column 34

Shouldn’t it be able to handle this sort of stuff? If it can handle them, how could I do it? If not, is there a module that can handle malformed documents?

EDIT: here’s an update. I saved the page locally, using firefox, and I tried to create a soup object from the contents of the file. That’s where BeautifulSoup fails. If I try to create a soup object directly from the website, it works.Here’s the document that causes trouble for soup.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T00:07:33+00:00

Worked fine for me using BeautifulSoup version 3.0.7. The latest is 3.1.0, but there’s a note on the BeautifulSoup home page to try 3.0.7a if you’re having trouble. I think I ran into a similar problem as yours some time ago and reverted, which fixed the problem; I’d try that.

If you want to stick with your current version, I suggest removing the large <script> block at the top, since that is where the error occurs, and since you cannot parse that section with BeautifulSoup anyway.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions