I’m trying to understand some String class functions in Java. So, here’s is a simple code:
/* different experiments with String class */
public class TestStrings {
public static void main(String[] args) {
String greeting = "Hello\uD835\uDD6b";
System.out.println("Number of code units in greeting is " + greeting.length());
System.out.println("Number of code points " + greeting.codePointCount(0,greeting.length()));
int index = greeting.offsetByCodePoints(0,6);
System.out.println("index = " + index);
int cp = greeting.codePointAt(index);
System.out.println("Code point at index is " + (char) cp);
}
}
\uD835\uDD6b is an ℤ symbol, so it’s ok surrogate pair.
So, the string has 6(six) code points and 7(seven) code units (2-byte chars). As it’s in documentation:
offsetByCodePointspublic int offsetByCodePoints(int index, int codePointOffset)Returns the index within this String that is offset from the given index by codePointOffset code points.
Unpaired surrogates within the text range given by index and codePointOffset count as one code point each.Parameters:
index– the index to be offset
codePointOffset– the offset in code points
So we do give an argument in code points. But, with given arguments (0,6) it still works fine, without exceptions. But fails for codePointAt(), because it returns 7 which is out of bounds. So, maybe the function gets its args in code units? Or I’ve missed something.
codePointAttakes acharindex.There are six code-points in that string. The
offsetByCodePointscall returns the index after 6 code-points which is char-index 7. You then try to get thecodePointAt(7)which is at the end of the string.To see why, consider what
because to count past all 0 code-points, you have to count past all 0
chars.Extrapolating that to your string, to count past all
6code-points, you have to count past all 7chars.Maybe seeing
codePointAtin use will make this clear. This is the idiomatic way to iterate over all code-points in a string (orCharSequence):