I’m working on some web-parser on python and now stacked with special symbols like

Question

0

Asked: June 18, 20262026-06-18T09:53:46+00:00 2026-06-18T09:53:46+00:00

I’m working on some web-parser on python and now stacked with special symbols like

0

I’m working on some web-parser on python and now stacked with special symbols like ★ ✿ • and other, sometimes I get them in utf-8: "â¿" and sometimes in unicode: u"\xe2\x80\xa2". I have found the table of them but the only thing I can do is:

set = []
set.append([u"\xe2\x80\xa2","&#8226;"])
set.append(["&#226;&#156;&#191;","&#10047;"])
for i in set:
    s=s.replace(i[0],i[1])

I write it with my hands.

Because I could not find the table that associate the left ones with the right.

Can you help me, please?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T09:53:47+00:00

Given a Unicode string containing a single character:

symbol = u'★'

It can be converted to the HTML syntax like so:

html = '&#{};'.format(ord(symbol))

To convert back, extract the number by stripping off the &# and ;, convert to an integer, and then use chr (Python 3) or unichr (Python 2).

If you need to deal with input not from the above conversion, you may need to deal with hexadecimal ones, too, which look like &#xZZZ; where ZZZ is a bunch of hexadecimal digits. To detect these, simply notice that it starts with an x; parse the remainder with radix 16.

Furthermore, you may need to deal with named entities. See the last two paragraphs for that.

If you want Python to deal with the encoding of a whole string, you can use this:

text = u"I like symb★ls!"
html = text.encode('ascii', errors='xmlcharrefreplace').decode('ascii')

Unfortunately, there is no equivalent for decoding, and this also does not escape potentially hazardous HTML characters such as < (which may or may not be what you want). If you need to decode, perhaps use a proper HTML parser which will also be able to deal with named entities like &clubs; (♣).

If you want to deal with named entities and do not want to use a real HTML parser, there is a machine-readable (with Python’s json module) list of entities.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on some web-parser on python and now stacked with special symbols like

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply