Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7782887
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T19:40:53+00:00 2026-06-01T19:40:53+00:00

Possible Duplicate: Why UTF-32 exists whereas only 21 bits are necessary to encode every

  • 0

Possible Duplicate:
Why UTF-32 exists whereas only 21 bits are necessary to encode every character?

The maximum Unicode code point is 0x10FFFF in UTF-32. UTF-32 has 21 information bits and 11 superfluous blank bits. So why is there no UTF-24 encoding (i.e. UTF-32 with the high byte removed) for storing each code point in 3 bytes rather than 4?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T19:40:54+00:00Added an answer on June 1, 2026 at 7:40 pm

    Well, the truth is : UTF-24 was suggested in 2007 :

    http://unicode.org/mail-arch/unicode-ml/y2007-m01/0057.html

    The mentioned pros & cons being :

    "UTF-24 
    Advantages: 
     1. Fixed length code units. 
     2. Encoding format is easily detectable for any content, even if mislabeled. 
     3. Byte order can be reliably detected without the use of BOM, even for single-code-unit data. 
     4. If octets are dropped / inserted, decoder can resync at next valid code unit. 
     5. Practical for both internal processing and storage / interchange. 
     6. Conversion to code point scalar values is more trivial then for UTF-16 surrogate pairs 
        and UTF-7/8 multibyte sequences. 
     7. 7-bit transparent version can be easily derived. 
     8. Most compact for texts in archaic scripts. 
    Disadvantages: 
     1. Takes more space then UTF-8/16, except for texts in archaic scripts. 
     2. Comparing to UTF-32, extra bitwise operations required to convert to code point scalar values. 
     3. Incompatible with many legacy text-processing tools and protocols. "
    

    As pointed out by David Starner in http://www.mail-archive.com/unicode@unicode.org/msg16011.html :

    Why? UTF-24 will almost invariably be larger then UTF-16, unless you
    are talking a document in Old Italic or Gothic. The math alphanumberic
    characters will almost always be combined with enough ASCII to make
    UTF-8 a win, and if not, enough BMP characters to make UTF-16 a win.
    Modern computers don’t deal with 24 bit chunks well; in memory, they’d
    take up 32 bits a piece, unless you declared them packed, and then
    they’d be a lot slower then UTF-16 or UTF-32. And if you’re storing to
    disk, you may as well use BOCU or SCSU (you’re already going
    non-standard), or use standard compression with UTF-8, UTF-16, BOCU or
    SCSU. SCSU or BOCU compressed should take up half the space of UTF-24,
    if that.

    You could also check the following StackOverflow post :

    Why UTF-32 exists whereas only 21 bits are necessary to encode every character?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Possible Duplicate: How to decode Unicode escape sequences like “\u00ed” to proper UTF-8 encoded
Possible Duplicate: How to decode Unicode escape sequences like “\u00ed” to proper UTF-8 encoded
Possible Duplicate: string encode / decode Now the subject looks like: =?UTF-8?B?0J/RgNC+0LLQtdGA0LrQsA==?=
Possible Duplicate: How to get the character from unicode value in PHP? PHP: Convert
Possible Duplicate: PHP: the ultimate clean/secure function I have got this code when I
Possible Duplicate: How to get UTF-8 working in java webapps? I have a servlet
Possible Duplicate: PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string
Possible Duplicate: How do convert unicode escape sequences to unicode characters in a .NET
Possible Duplicate: How to add a UTF-8 BOM in java My oracle database has
Possible Duplicate: How to use UTF-8 in resource properties with ResourceBundle I want to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.