Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8848879
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T12:30:24+00:00 2026-06-14T12:30:24+00:00

unichr(0x10000) fails with a ValueError when cpython is compiled without –enable-unicode=ucs4 . Is there

  • 0

unichr(0x10000) fails with a ValueError when cpython is compiled without --enable-unicode=ucs4.

Is there a language builtin or core library function that converts an arbitrary unicode scalar value or code-point to a unicode string that works regardless of what kind of python interpreter the program is running on?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T12:30:25+00:00Added an answer on June 14, 2026 at 12:30 pm

    Yes, here you go:

    >>> unichr(0xd800)+unichr(0xdc00)
    u'\U00010000'
    

    The crucial point to understand is that unichr() converts an integer to a single code unit in the Python interpreter’s string encoding. The The Python Standard Library documentation for 2.7.3, 2. Built-in Functions, on unichr() reads,

    Return the Unicode string of one character whose Unicode code is the integer i…. The valid range for the argument depends how Python was configured – it may be either UCS2 [0..0xFFFF] or UCS4 [0..0x10FFFF]. ValueError is raised otherwise.

    I added emphasis to “one character”, by which they mean “one code unit” in Unicode terms.

    I’m assuming that you are using Python 2.x. The Python 3.x interpreter has no built-in unichr() function. Instead the The Python Standard Library documentation for 3.3.0, 2. Built-in Functions, on chr() reads,

    Return the string representing a character whose Unicode codepoint is the integer i…. The valid range for the argument is from 0 through 1,114,111 (0x10FFFF in base 16).

    Note that the return value is now a string of unspecified length, not a string with a single code unit. So in Python 3.x, chr(0x10000) would behave as you expected. It “converts an arbitrary unicode scalar value or code-point to a unicode string that works regardless of what kind of python interpreter the program is running on”.

    But back to Python 2.x. If you use unichr() to create Python 2.x unicode objects, and you are using Unicode scalar values above 0xFFFF, then you are committing your code to being aware of the Python interpreter’s implementation of unicode objects.

    You can isolate this awareness with a function which tries unichr() on a scalar value, catches ValueError, and tries again with the corresponding UTF-16 surrogate pair:

    def unichr_supplemental(scalar):
         try:
             return unichr(scalar)
         except ValueError:
             return unichr( 0xd800 + ((scalar-0x10000)//0x400) ) \
                   +unichr( 0xdc00 + ((scalar-0x10000)% 0x400) )
    
    >>> unichr_supplemental(0x41),len(unichr_supplemental(0x41))
    (u'A', 1)
    >>> unichr_supplemental(0x10000), len(unichr_supplemental(0x10000))
    (u'\U00010000', 2)
    

    But you might find it easier to just convert your scalars to 4-byte UTF-32 values in a UTF-32 byte string, and decode this byte string into a unicode string:

    >>> '\x00\x00\x00\x41'.decode('utf-32be'), \
    ... len('\x00\x00\x00\x41'.decode('utf-32be'))
    (u'A', 1)
    >>> '\x00\x01\x00\x00'.decode('utf-32be'), \
    ... len('\x00\x01\x00\x00'.decode('utf-32be'))
    (u'\U00010000', 2)
    

    The code above was tested on Python 2.6.7 with UTF-16 encoding for Unicode strings. I didn’t test it on a Python 2.x intepreter with UTF-32 encoding for Unicode strings. However, it should work unchanged on any Python 2.x interpreter with any Unicode string implementation.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm storing large unicode characters ( 0x10000 +) as long types which eventually need
I'd like to test the Unicode handling of my code. Is there anything I
If I need to have the following python value, unicode char '0': >>> unichr(0)
My custom object contains an array of unichar primitives. However there is no encodeUnichar:ForKey
I'm trying to find the boundaries of a line of text in Core Text.
Suppose the char of ▣ is in somefont.ttf's glyph table. char = unichr(9635) subprocess.call(['convert',
I have a set of unicode numbers , I need to convert them to
There's one thing I don't understand regarding ARC: how should we now treat local
There is an NSString method -characterAtIndex: which returns an unichar. (unichar)characterAtIndex:(NSUInteger)index I wonder if
How are unicode comparisons coded? I need to test exactly as below, checking for

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.