In Java I create a string that uses unicode and overline because I am trying to display square roots of numbers. I need to know the length of the string for some formatting issues. When using the combining characters in unicode the usual methods for finding string length seem to fail as seen by the following example. Can anyone help me find the length of the second string when random numbers are in the square root, or tips on how to do the square root display better?
String s = "\u221A"+"12";
String t = "\u221A"+"1"+"\u0305"+"2"+"\u0305";
System.out.println(s);
System.out.println(t);
System.out.println(s.length());
System.out.println(t.length());
Thanks for any help, I couldn’t find anything on this using google.
They don’t fail, the report the string lenght as number of Unicode characters [*]. If you need another behaviour, you need to define clearly what you mean by “string length”.
When you are interested in string lengths for displaying purposes, then usually your are interested in counting pixels (or some other logical/physical unit), and that’s responsability of the display layer (to begin with, you might have different widths for different characters, if the font is not monospaced).
But if you’re just interested in counting the number of graphemes (“a minimally distinctive unit of writing in the context of a particular writing system”), here‘s a nice guide with code and examples. Copying-trimming-pasting the relevant code from there, we’d have something like this:
Bear in mind: the above uses the default
locale. A more flexible and robust method would, eg, receive an explicitlocaleas argument and invokeBreakIterator.getCharacterInstance(locale)instead[*] To be precise, as pointed out in comments,
String.length()counts Java characters, which are are actually code-units in a UTF-16 encoding. This is equivalent to counting Unicode characters only if we are inside the BMP.