I’ve never understood the point of UTF-16 encoding. If you need to be able

Question

0

Asked: May 20, 20262026-05-20T16:42:31+00:00 2026-05-20T16:42:31+00:00

I’ve never understood the point of UTF-16 encoding. If you need to be able

0

I’ve never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-32, since UTF-16 is still variable length. If you don’t need this, then UTF-16 seems like a colossal waste of space compared to UTF-8. What are the advantages of UTF-16 over UTF-8 and UTF-32 and why do Windows and Java use it as their native encoding?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T16:42:32+00:00

When Windows NT was designed UTF-16 didn’t exist (NT 3.51 was born in 1993, while UTF-16 was born in 1996 with the Unicode 2.0 standard); there was instead UCS-2, which, at that time, was enough to hold every character available in Unicode, so the 1 code point = 1 code unit equivalence was actually true – no variable-length logic needed for strings.

They moved to UTF-16 later, to support the whole Unicode character set; however they couldn’t move to UTF-8 or to UTF-32, because this would have broken binary compatibility in the API interface (among the other things).

As for Java, I’m not really sure; since it was released in ~1995 I suspect that UTF-16 was already in the air (even if it wasn’t standardized yet), but I think that compatibility with NT-based operating systems may have played some role in their choice (continuous UTF-8 <-> UTF-16 conversions for every call to Windows APIs can introduce some slowdown).

Edit

Wikipedia explains that even for Java it went in the same way: it originally supported UCS-2, but moved to UTF-16 in J2SE 5.0.

So, in general when you see UTF-16 used in some API/Framework it is because it started as UCS-2 (to avoid complications in the string-management algorithms) but it moved to UTF-16 to support the code points outside the BMP, still maintaining the same code unit size.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve never understood the point of UTF-16 encoding. If you need to be able

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply