In Python 2.7’s documentation, three rules about Unicode are described as follows: If the

Question

0

Asked: May 27, 20262026-05-27T01:37:33+00:00 2026-05-27T01:37:33+00:00

In Python 2.7’s documentation, three rules about Unicode are described as follows: If the

0

In Python 2.7’s documentation, three rules about Unicode are described as follows:

If the code point is <128, it’s represented by the corresponding byte value.

If the code point is between 128 and 0x7ff, it’s turned into two byte values between 128 and 255.

Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255.

Then I made some tests about it:

>>>> unichr(40960)

u'\ua000'

>>> ord(u'\ua000')

40960

In my view, 40960 is a code point > 0x7ff, so it should be turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255, but it only be turned into two-bytes sequence, and the value ’00’ in u’\a000′ is lower than 128, not matched with the rules mentioned above. Why?

What’s more, I found some more Unicode characters, such as u'\u1234', etc. I found that the value ("12" && "34") in it is also lower than 128, but according to the thoery mentioned first, they shouldn’t be lower than 128. Any other theories that I lost?

Thanks for all answers.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T01:37:33+00:00

in python2.7’s documentation, three rules about unicodes are described as follows:

That is a description of the UTF-8 encoding.

Then I made some tests about it:

\ua000 is an escape sequence representing a Unicode character. The a000 is a hexadecimal representation of the numerical code point value. It has nothing to do with UTF-8 encoding.

You get UTF-8 encoding when you explicitly encode a unicode string using the UTF-8 encoding.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In Python 2.7’s documentation, three rules about Unicode are described as follows: If the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply