What is the “correct” way of comparing a code-point to a Java character? For example:
int codepoint = String.codePointAt(0);
char token = '\n';
I know I can probably do:
if (codepoint==(int) token)
{ ... }
but this code looks fragile. Is there a formal API method for comparing codepoints to chars, or converting the char up to a codepoint for comparison?
A little bit of background: When Java appeared in 1995, the
chartype was based on the original “Unicode 88” specification, which was limited to 16 bits. A year later, when Unicode 2.0 was implemented, the concept of surrogate characters was introduced to go beyond the 16 bit limit.Java internally represents all
Strings in UTF-16 format. For code points exceeding U+FFFF the code point is represented by a surrogate pair, i.e., twochars with the first being the high-surrogates code unit, (in the range \uD800-\uDBFF), the second being the low-surrogate code unit (in the range \uDC00-\uDFFF).From the early days, all basic
Charactermethods were based on the assumption that a code point could be represented in onechar, so that’s what the method signatures look like. I guess to preserve backward compatibility that was not changed when Unicode 2.0 came around and caution is needed when dealing with them. To quote from the Java documentation:Casting the
charto anint, as you do in your sample, works fine though.