Will I have to connect again just like I did…

Question

0

Asked: May 15, 20262026-05-15T18:28:08+00:00 2026-05-15T18:28:08+00:00

For instance class files use CESU-8 (sometimes also called MUTF-8), but internally Java first

0

For instance class files use CESU-8 (sometimes also called MUTF-8), but internally Java first used UCS-2 and now it uses UTF-16. The specification about valid Java source files says that a minimal conforming Java compiler only has to accept ASCII characters.

What’s the reason for these choices? Wouldn’t it make more sense to use the same encoding throughout the Java ecosystem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T18:28:08+00:00

ASCII for source files is because at the time it wasn’t considered reasonable to expect people to have text editors with full Unicode support. Things have improved since, but they still aren’t perfect. The whole \uXXXX thing in Jave is essentially Java’s equivalent to C’s trigraphs. (When C was created, some keyboards didn’t have curly braces, so you had to use trigraphs!)

At the time Java was created, the class file format used UTF-8 and the runtime used UCS-2. Unicode had less than 64k codepoints, so 16 bits was enough. Later, when additional “planes” were added to Unicode, UCS-2 was replaced with the (pretty much) compatible UTF-16, and UTF-8 was replaced with CESU-8 (hence “Compatibility Encoding Scheme…”).

In the class file format they wanted to use UTF-8 to save space. The design of the class file format (including the JVM instruction set) was heavily geared towards compactness.

In the runtime they wanted to use UCS-2 because it was felt that saving space was less important than being able to avoid the need to deal with variable-width characters. Unfortunately, this kind of backfired now that it’s UTF-16, because a codepoint can now take multiple “chars”, and worse, the “char” datatype is now sort of misnamed (it no longer corresponds to a character, in general, but instead corresponds to a UTF-16 code-unit).

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions