In [1]: str=’美’ In [2]: str.encode(‘utf-8′) Out[2]: b’\xe7\xbe\x8e’ In [3]: str.encode(‘utf-16′) Out[3]: b’\xff\xfe\x8e\x7f’ In

Question

0

Asked: May 28, 20262026-05-28T04:35:03+00:00 2026-05-28T04:35:03+00:00

In [1]: str=’美’ In [2]: str.encode(‘utf-8′) Out[2]: b’\xe7\xbe\x8e’ In [3]: str.encode(‘utf-16′) Out[3]: b’\xff\xfe\x8e\x7f’ In

0

In [1]: str='美'

In [2]: str.encode('utf-8')
Out[2]: b'\xe7\xbe\x8e'

In [3]: str.encode('utf-16')
Out[3]: b'\xff\xfe\x8e\x7f'

In [4]: str.encode('ascii')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
/Users/XXXuserXXXTemp/<ipython-input-4-c7b96e3e54a7> in <module>()
----> 1 str.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode character '\u7f8e' in position 0: ordinal not in range(128)

The str is a Chinese/Japanese character.

why ascii does not work?
how to understand Out[2] and Out[3], i.e. what they really are?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T04:35:04+00:00

Why ascii does not work?

str='美' is not an ASCII character, it’s outside the ASCII range and therefore can’t be represented as an ASCII character.

From the Unicode tutorial for python:

Encodings don’t have to handle every possible Unicode character, and most encodings don’t. For example, Python’s default encoding is the ‘ascii’ encoding. The rules for converting a Unicode string into the ASCII encoding are simple; for each code point:

If the code point is < 128, each byte is the same as the value of the code point.

If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a UnicodeEncodeError exception in this case.)

how to understand Out[2] and Out[3], i.e. what they really are?

They are byte strings (not character strings). Out[2] is the sequence of bytes which represents the 美 codepoint in UTF-8 code units. The notation \xe7 means a byte with the hexadecimal value e7. Out[3] is the sequence of bytes which represents the 美 codepoint in UTF-16 code units.

To understand the distinction between characters, bytes, and code units, read the Unicode tutorial for python carefully and completely. For another, fairly good, treatment of the same material, read Joel Spolsky’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). You should know this much, no excuses!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In [1]: str=’美’ In [2]: str.encode(‘utf-8′) Out[2]: b’\xe7\xbe\x8e’ In [3]: str.encode(‘utf-16′) Out[3]: b’\xff\xfe\x8e\x7f’ In

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply