For instance class files use CESU-8 (sometimes also called MUTF-8), but internally Java first used UCS-2 and now it uses UTF-16. The specification about valid Java source files says that a minimal conforming Java compiler only has to accept ASCII characters.
What’s the reason for these choices? Wouldn’t it make more sense to use the same encoding throughout the Java ecosystem?
ASCII for source files is because at the time it wasn’t considered reasonable to expect people to have text editors with full Unicode support. Things have improved since, but they still aren’t perfect. The whole
\uXXXXthing in Jave is essentially Java’s equivalent to C’s trigraphs. (When C was created, some keyboards didn’t have curly braces, so you had to use trigraphs!)At the time Java was created, the class file format used UTF-8 and the runtime used UCS-2. Unicode had less than 64k codepoints, so 16 bits was enough. Later, when additional “planes” were added to Unicode, UCS-2 was replaced with the (pretty much) compatible UTF-16, and UTF-8 was replaced with CESU-8 (hence “Compatibility Encoding Scheme…”).
In the class file format they wanted to use UTF-8 to save space. The design of the class file format (including the JVM instruction set) was heavily geared towards compactness.
In the runtime they wanted to use UCS-2 because it was felt that saving space was less important than being able to avoid the need to deal with variable-width characters. Unfortunately, this kind of backfired now that it’s UTF-16, because a codepoint can now take multiple “chars”, and worse, the “char” datatype is now sort of misnamed (it no longer corresponds to a character, in general, but instead corresponds to a UTF-16 code-unit).