This is my code: from bs4 import BeautifulSoup as BS import urllib2 url =

Question

0

Asked: June 16, 20262026-06-16T04:34:48+00:00 2026-06-16T04:34:48+00:00

This is my code: from bs4 import BeautifulSoup as BS import urllib2 url =

0

This is my code:

from bs4 import BeautifulSoup as BS
import urllib2
url = "http://services.runescape.com/m=news/recruit-a-friend-for-free-membership-and-xp"
res = urllib2.urlopen(url)
soup = BS(res.read())
other_content = soup.find_all('div',{'class':'Content'})[0]
print other_content

Yet an error comes up:

/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py:149: RuntimeWarning: Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.
  "Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help."))
Traceback (most recent call last):
  File "web.py", line 5, in <module>
    soup = BS(res.read())
  File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in __init__
    self._feed()
  File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in _feed
    self.builder.feed(self.markup)
  File "/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py", line 150, in feed
    raise e

I’ve let two other people use this code, and it works for them perfectly fine. Why is it not working for me? I have bs4 installed…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T04:34:50+00:00

Per the error message, one thing you may need to do is install lxml, which will provide a more powerful parsing engine for BeautifulSoup to use. See this section in the docs for a better overview, but the likely reason that it works for two other people is that they have lxml (or another parser that handles the HTML properly) installed, meaning that BeautifulSoup uses it instead of the standard built-in (side note: your example works for me as well on a system with lxml installed, but fails on one without it).

Also, see this note in the docs:

If you’re using a version of Python 2 earlier than 2.7.3, or a version
of Python 3 earlier than 3.2.2, it’s essential that you install lxml
or html5lib–Python’s built-in HTML parser is just not very good in
older versions.

I would recommend running sudo apt-get install python-lxml and seeing if the problem continues.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is my code: from bs4 import BeautifulSoup as BS import urllib2 url =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply