Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6016565
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T03:00:03+00:00 2026-05-23T03:00:03+00:00

I have a university programming exam coming up, and one section is on unicode.

  • 0

I have a university programming exam coming up, and one section is on unicode.

I have checked all over for answers to this, and my lecturer is useless so that’s no help, so this is a last resort for you guys to possibly help.

The question will be something like:

The string ‘mЖ丽’ has these unicode codepoints U+006D, U+0416 and
U+4E3D, with answers written in hexadecimal, manually encode the
string into UTF-8 and UTF-16.

Any help at all will be greatly appreciated as I am trying to get my head round this.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T03:00:04+00:00Added an answer on May 23, 2026 at 3:00 am

    Wow. On the one hand I’m thrilled to know that university courses are teaching to the reality that character encodings are hard work, but actually knowing the UTF-8 encoding rules sounds like expecting a lot. (Will it help students pass the Turkey test?)

    The clearest description I’ve seen so far for the rules to encode UCS codepoints to UTF-8 are from the utf-8(7) manpage on many Linux systems:

    Encoding
       The following byte sequences are used to represent a
       character.  The sequence to be used depends on the UCS code
       number of the character:
    
       0x00000000 - 0x0000007F:
           0xxxxxxx
    
       0x00000080 - 0x000007FF:
           110xxxxx 10xxxxxx
    
       0x00000800 - 0x0000FFFF:
           1110xxxx 10xxxxxx 10xxxxxx
    
       0x00010000 - 0x001FFFFF:
           11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    
       [... removed obsolete five and six byte forms ...]
    
       The xxx bit positions are filled with the bits of the
       character code number in binary representation.  Only the
       shortest possible multibyte sequence which can represent the
       code number of the character can be used.
    
       The UCS code values 0xd800–0xdfff (UTF-16 surrogates) as well
       as 0xfffe and 0xffff (UCS noncharacters) should not appear in
       conforming UTF-8 streams.
    

    It might be easier to remember a ‘compressed’ version of the chart:

    Initial bytes starts of mangled codepoints start with a 1, and add padding 1+0. Subsequent bytes start 10.

    0x80      5 bits, one byte
    0x800     4 bits, two bytes
    0x10000   3 bits, three bytes
    

    You can derive the ranges by taking note of how much space you can fill with the bits allowed in the new representation:

    2**(5+1*6) == 2048       == 0x800
    2**(4+2*6) == 65536      == 0x10000
    2**(3+3*6) == 2097152    == 0x200000
    

    I know I could remember the rules to derive the chart easier than the chart itself. Here’s hoping you’re good at remembering rules too. 🙂

    Update

    Once you have built the chart above, you can convert input Unicode codepoints to UTF-8 by finding their range, converting from hexadecimal to binary, inserting the bits according to the rules above, then converting back to hex:

    U+4E3E
    

    This fits in the 0x00000800 - 0x0000FFFF range (0x4E3E < 0xFFFF), so the representation will be of the form:

       1110xxxx 10xxxxxx 10xxxxxx
    

    0x4E3E is 100111000111110b. Drop the bits into the x above (start from the right, we’ll fill in missing bits at the start with 0):

       1110x100 10111000 10111110
    

    There is an x spot left over at the start, fill it in with 0:

       11100100 10111000 10111110
    

    Convert from bits to hex:

       0xE4 0xB8 0xBE
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have come across this situation. In the hash1 first column is the key
i am trying this query but didn't work. ERROR: #1064 - You have an
I have a code that reads an HTML file from my local web server
I using devise gem for registration but i have problem. my table include: first_name,
I recently started following the online course on iPhone development from Stanford University on
Preface: This question is about a project I am working on with a professor
Have a lot of troubles on production server. Some routing cause crashing of Application
For a homework project, I'm creating a PHP driven website which main function is
I'm using Google Maps V3 api. I am submitting an address search to return

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.