Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6781803
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T16:40:01+00:00 2026-05-26T16:40:01+00:00

What is the best way to (losslessly) convert Unicode to a lower-order byte encoding

  • 0

What is the best way to (losslessly) convert Unicode to a lower-order byte encoding (8 bits), in a language inspecific way? I want a format that is standard, i.e. has widespread library support for conversion both directions.

If I were using Python, I would use repr:

In [1]: x = u"Российская Федерация"

In [2]: repr(x)
Out[2]: "u'\\xd0\\xa0\\xd0\\xbe\\xd1\\x81\\xd1\\x81\\xd0\\xb8\\xd0\\xb9\\xd1\\x81\\xd0\\xba\\xd0\\xb0\\xd1\\x8f \\xd0\\xa4\\xd0\\xb5\\xd0\\xb4\\xd0\\xb5\\xd1\\x80\\xd0\\xb0\\xd1\\x86\\xd0\\xb8\\xd1\\x8f'"

However, I’m looking for a format that has good library support for converting the second string back to the first, in a variety of languages.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T16:40:02+00:00Added an answer on May 26, 2026 at 4:40 pm

    Out[2]: “u’\xd0\xa0\xd0\xbe\xd1\x81\xd1\x81\xd0\xb8\xd0\xb9\xd1\x81\xd0\xba\xd0\xb0\xd1\x8f \xd0\xa4\xd0\xb5\xd0\xb4\xd0\xb5\xd1\x80\xd0\xb0\xd1\x86\xd0\xb8\xd1\x8f'”

    If that’s what you see, your terminal is set up wrong, it’s treating UTF-8 input as being ISO-8859-1 (or cp1252 in the case of the Windows console, which isn’t possible to set up right).

    The proper Python repr of Российская Федерация would be the Unicode literal:

    u'\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u0430\u044f \u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u044f'
    

    Which as it happens is pretty close to the JavaScript/JSON string literal

    "\u0420\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u0430\u044f \u0424\u0435\u0434\u0435\u0440\u0430\u0446\u0438\u044f"
    

    If you want a 7-bit-safe (ASCII) representation of a Unicode string, JSON is a reasonable choice of format. Get it by using json.dumps() though rather than hacking the Python repr, since there are some subtle inconsistencies between the two formats.

    Other well-understood ASCII representations you could try might include URL-encoding (%D0%A0%D0%BE...) and XML character escapes (<value>&#x0420;&#x043e;&#x0441;...</value>).

    If you only need an arbitrary binary representation that doesn’t need to be 7-bit safe, as Max mentioned, just .encode('utf-8').

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What is best way to convert date from JavaScript string in format YYYYMMDD to
What is the best way to verify/test that a text string is serialized to
The best way to explain this is with code: byte roominspiration = 0; query
The best way to do this? Tried things like that: public String FormatColumnName(String columnName)
The best way, of course, is to convert the method to a property. But
Best way to allow the User to define a Table’s Order? We are using
What would be the best way to define a method in the model that
What's the best way to store an image in a database in binary format,
Best way to load seed data? I have an Author table that is tightly
The best way to take a string that is formated like... YYYY-MM-DD and make

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.