I am trying to read in some sentences from a file that contains unicode characters. It does print out a string but for some reason it messes up the unicode characters
This is the code I have:
public static String readSentence(String resourceName) {
String sentence = null;
try {
InputStream refStream = ClassLoader
.getSystemResourceAsStream(resourceName);
BufferedReader br = new BufferedReader(new InputStreamReader(
refStream, Charset.forName("UTF-8")));
sentence = br.readLine();
} catch (IOException e) {
throw new RuntimeException("Cannot read sentence: " + resourceName);
}
return sentence.trim();
}
The problem is probably in the way that the string is being output.
I suggest that you confirm that you are correctly reading the Unicode characters by doing something like this:
and see if the Unicode codepoints are correct for the characters that are being messed up. If they are correct, then the problem is output side: if not, then input side.