I know there is String#length and the various methods in Character which more or less work on code units/code points.
What is the suggested way in Java to actually return the result as specified by Unicode standards (UAX#29), taking things like language/locale, normalization and grapheme clusters into account?
java.text.BreakIteratoris able to iterate over text and can report on “character”, word, sentence and line boundaries.Consider this code:
Running it:
With surrogate pairs:
This should do the job in most cases.