This is my code:
from bs4 import BeautifulSoup as BS
import urllib2
url = "http://services.runescape.com/m=news/recruit-a-friend-for-free-membership-and-xp"
res = urllib2.urlopen(url)
soup = BS(res.read())
other_content = soup.find_all('div',{'class':'Content'})[0]
print other_content
Yet an error comes up:
/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py:149: RuntimeWarning: Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help.
"Python's built-in HTMLParser cannot parse the given document. This is not a bug in Beautiful Soup. The best solution is to install an external parser (lxml or html5lib), and use Beautiful Soup with that parser. See http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser for help."))
Traceback (most recent call last):
File "web.py", line 5, in <module>
soup = BS(res.read())
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in __init__
self._feed()
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in _feed
self.builder.feed(self.markup)
File "/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py", line 150, in feed
raise e
I’ve let two other people use this code, and it works for them perfectly fine. Why is it not working for me? I have bs4 installed…
Per the error message, one thing you may need to do is install
lxml, which will provide a more powerful parsing engine for BeautifulSoup to use. See this section in the docs for a better overview, but the likely reason that it works for two other people is that they havelxml(or another parser that handles the HTML properly) installed, meaning that BeautifulSoup uses it instead of the standard built-in (side note: your example works for me as well on a system withlxmlinstalled, but fails on one without it).Also, see this note in the docs:
I would recommend running
sudo apt-get install python-lxmland seeing if the problem continues.