I’m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which

Question

0

Asked: May 13, 20262026-05-13T12:22:15+00:00 2026-05-13T12:22:15+00:00

I’m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which

0

I’m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn’t automatically decode for me:

>>> from BeautifulSoup import BeautifulSoup

>>> soup = BeautifulSoup("<p>&pound;682m</p>")
>>> text = soup.find("p").string

>>> print text
&pound;682m

How can I decode the HTML entities in text to get "£682m" instead of "£682m".

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T12:22:16+00:00

Python 3.4+

Use html.unescape():

import html
print(html.unescape('&pound;682m'))

FYI html.parser.HTMLParser.unescape is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon.

Python 2.6-3.3

You can use HTMLParser.unescape() from the standard library:

For Python 2.6-2.7 it’s in HTMLParser
For Python 3 it’s in html.parser

>>> try:
...     # Python 2.6-2.7 
...     from HTMLParser import HTMLParser
... except ImportError:
...     # Python 3
...     from html.parser import HTMLParser
... 
>>> h = HTMLParser()
>>> print(h.unescape('&pound;682m'))
£682m

You can also use the six compatibility library to simplify the import:

>>> from six.moves.html_parser import HTMLParser
>>> h = HTMLParser()
>>> print(h.unescape('&pound;682m'))
£682m

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which

Leave an answerCancel reply

1 Answer

Python 3.4+

Python 2.6-3.3

Leave an answer
Cancel reply