Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6782927
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T16:48:10+00:00 2026-05-26T16:48:10+00:00

Ok so I have another Python Unicode problem. In IDLE windows 7,The following code:

  • 0

Ok so I have another Python Unicode problem. In IDLE windows 7,The following code:

uni = u"\u4E0D\u65E0"
binary = uni.encode("utf-8")
print binary

prints two chinese characters, 不无, the correct ones. However, if I replace the first line with

uni = u"\u65E0"

ie only the second character, it prints æ— instead. Altough if I replace it with only the first character

u"\u4E0D"

it gives the correct output 不

Is this a bug, or what am I doing wrong?

COMPLETE CODE:

uni = u"\u4E0D\u65E0"

binary = uni.encode("utf-8")

print binary

uni = u"\u65E0"

binary = uni.encode("utf-8")

print binary

uni = u"\u4E0D"

binary = uni.encode("utf-8")

print binary

OUTPUT:

不无

æ— 

不

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T16:48:11+00:00Added an answer on May 26, 2026 at 4:48 pm

    The unicode string u"\u4E0D\u65E0" consists of the two text characters 不 and 无.

    When a unicode string is encoded, it is converted into a sequence of bytes (not binary). Depending on what encoding is used, there may not be a one-to-one mapping of text characters to bytes. The “utf8” encoding, for instance, can use from one to three bytes to represent a single character:

    >>> u'\u65E0'.encode('utf8')
    '\xe6\x97\xa0'
    

    Now, before a sequence of bytes can be printed, python (or IDLE) has to try to decode it. But since it has no way to know what encoding was used, it is forced to guess. For some reason, it appears that IDLE may have wrongly guessed “cp1252” for one of the examples:

    >>> text = u'\u65E0'.encode('utf8').decode('cp1252')
    >>> text
    u'\xe6\u2014\xa0'
    >>> print text
    æ— 
    

    Note that there are three characters in text – the last one is a non-breaking space.

    EDIT

    Strictly speaking, IDLE wrongly guesses “cp1252” for all three examples. The second one only “succeeds” because each byte coincidently maps to a valid text character (“cp1252” is an 8-bit, single-byte encoding). The other two examples contain the byte \x8d, which is not defined in “cp1252”. For these cases, IDLE (eventually) falls back to “utf8”, which gives the correct output.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have another newbie Python question. I have the following piece of code that
I am in Windows and Suppose I have a main python code that calls
I have a list of objects in Python. I then have another list of
Let's assume I have a main script, main.py, that imports another python file with
If I have a python class, how can I alias that class-name into another
In Python, can an object have another object as an attribute? For example, can
I'm reading in a file with Python's csv module, and have Yet Another Encoding
I have a simple question about Python: I have another Python script listening on
I have a python script which uses subprocess.Popen to run multiple instances of another
I have another question for you. I have a python class with a list

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.