For some reason there is a problem using string.split in certain cases. It must have something to do with the encoding of the files. However, I made sure that in Eclipse under Encoding it says “Default MacRoman” in all cases – which is the default used in all the code files.
Nevertheless when I copy a string from one of my java files to another one containing this string:
"Test String" - while typing this string again "Test String"
There is a difference. In the first one the ” ” space is coded as 160 while in the second one it is 32.
So when using split I have to make sure to use the correctly encoded ” “.
This is how I do it now – not very elegant I think.
Where longText contains the text to be split
char splitChar;
if (longText.indexOf((char) 32) > 0) {
splitChar = (char) 32;
} else
splitChar = (char) 160;
String splitCharString = String.format("%c", splitChar);
String[] tokens = longText.split(splitCharString);
Is there a better way to do this?
ps just explicitly changing the encoding of a file in Eclipse to MacRoman does not work
160 is the Latin1 (and hence Unicode) codepoint for the non-breaking space character. It is different from a normal space.
The MacRoman character set has this at a different codepoint (202). Generally, for editing Java source, you should be using a Unicode encoding such as UTF-8.