According to the Java SE 7 Specification, Java uses the Unicode UTF-16 standard to represent characters.
When imagining a String as a simple array of 16-bit variables each containing one character, life is simple.
Unfortunately, there are code points for which 16 bits simply aren’t enough (I believe it was 16/17th of all Unicode characters). So in a String, this poses no direct problem, because when wanting to store one of these ~1.048.576 characters using an additional two bytes, simply two array positions in that String would be used.
This, without posing any direct problem, works for Strings, because there can always be an additional two bytes. Though when it comes to single variables which, in contrast to the UTF-16 encoding, have a fixed length of 16 bits, how can these characters be stored, and in particular, how does Java do it with its 2-byte “char” type?
The answer is in the javadoc :
Simply said :
Even simpler said :
As an aside, it can be noted that the evolution of Unicode to extend past the BMP made UTF-16 globally irrelevant, now that UTF-16 doesn’t even enable a fixed byte-chars ratio. That’s why more modern languages are based on UTF-8. This manifesto helps understand it.