By referring Joel’s Article Some people are under the misconception that Unicode is simply

Question

0

Asked: May 13, 20262026-05-13T21:34:14+00:00 2026-05-13T21:34:14+00:00

By referring Joel’s Article Some people are under the misconception that Unicode is simply

0

Some people are under the
misconception that Unicode is simply a
16-bit code where each character takes
16 bits and therefore there are 65,536
possible characters. This is not,
actually, correct.

After reading the whole article, my point is that, if someone told you, his text is in unicode, you will have no idea how much memory space taken up by every of his character. He have to tell you, “My unicode text is encoded in UTF-8”, then only you will have idea how much memory space is taken up by every of his character.

Unicode = not necessary 2 byte for each character

However, when comes to Code Project’s Article and Microsoft’s Help, this confused me :

Microsoft :

Unicode is a 16-bit character
encoding, providing enough encodings
for all languages. All ASCII
characters are included in Unicode as
“widened” characters.

Code Project :

The Unicode character set is a “wide
character” (2 bytes per character) set
that contains every character
available in every language, including
all technical symbols and special
publishing characters. Multibyte
character set (MBCS) uses either 1 or
2 bytes per character

Unicode = 2 byte for each character ?

Is 65536 possible characters able to represent all language in this world?

Why the concept seems different among web developer community and desktop developer community?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T21:34:14+00:00

Once upon a time,

Unicode had only as many characters as fit in 16 bits, and
UTF-8 did not exist or was not the de facto encoding to use.

These factors led to UTF-16 (or rather, what is now called UCS-2) to be considered synonymous with “Unicode”, because it was after all the encoding which supported all of Unicode.

Practically, you will see “Unicode” being used where “UTF-16” or “UCS-2” is meant. This is a historical confusion and should be ignored and not propagated. Unicode is a set of characters; UTF-8, UTF-16, and UCS-2 are different encodings.

(The difference between UTF-16 and UCS-2 is that UCS-2 is a true 16-bits-per-“character” encoding, and therefore encodes only the “BMP” (Basic Multilingual Plane) portion of Unicode, whereas UTF-16 uses “surrogate pairs” (for a total of 32 bits) to encode above-BMP characters.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

By referring Joel’s Article Some people are under the misconception that Unicode is simply

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply