I’m handling an encoding problem. My input is a unicode string, such as: >>>

Question

0

Asked: June 17, 20262026-06-17T19:26:12+00:00 2026-06-17T19:26:12+00:00

I’m handling an encoding problem. My input is a unicode string, such as: >>>

0

I’m handling an encoding problem.
My input is a unicode string, such as:

>>> s
u'\xa6\xe8\xac\xc9'

Actually it is encoded in cp950. I want to decode it: (notice there’s no “u”)

>>> print unicode('\xa6\xe8\xac\xc9', 'cp950')
西界

However, I don’t know how to get rid of that “u”.
Direct conversion is not working:

>>> str(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

The result of using encode() is not what I wanted:

>>> s.encode('utf8')
'\xc2\xa6\xc3\xa8\xc2\xac\xc3\x89'

what I want is '\xa6\xe8\xac\xc9'

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T19:26:13+00:00

This is a bit of an abuse of the unicode type. Characters in a unicode string are expected to be Unicode codepoints (e.g. u'\u897f\u754c'), and thus are encoding-agnostic. They are not supposed to be bytes from a specific encoding (Python 3 makes this distinction very clear by separating Unicode strings str, from byte strings bytes).

Since you want to just interpret each codepoint as bytes, you can do

u'\xa6\xe8\xac\xc9'.encode('iso-8859-1')

since the first 256 codepoints of Unicode are defined to be equal to the codepoints of ISO-8859-1. However, please try to fix the issue that gave you this incorrect Unicode string in the first place.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m handling an encoding problem. My input is a unicode string, such as: >>>

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply