I open Notepad (Windows) and write
Some lines with special characters
Special: Žđšćč
and go to Save As… “someFile.txt” with Encoding set to UTF-8.
In Java I have
FileInputStream fis = new FileInputStream(new File("someFile.txt"));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);
String line;
while((line = in.readLine()) != null) {
printLine(line);
}
in.close();
But I get question marks and similar “special” characters. Why?
EDIT: I have this input (one line in .txt file)
665,Žđšćč
and this code
FileInputStream fis = new FileInputStream(new File(fileName));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);
String line;
while((line = in.readLine()) != null) {
Toast.makeText(mContext, line, Toast.LENGTH_LONG).show();
Pattern p = Pattern.compile(",");
String[] article = p.split(line);
Toast.makeText(mContext, article[0], Toast.LENGTH_LONG).show();
Toast.makeText(mContext, Integer.parseInt(article[0]), Toast.LENGTH_LONG).show();
}
in.close();
And Toast output (for ones who aren’t familiar with Android, Toast is just a method to show a pop-up on screen with particular text in it) is fine. Console shows “weird characters” (probably because of encoding in console window). But it fails at parsing an integer because console says this (warning: toast output is just fine) – Problem?
It seems like the String is containing some “weird” characters which Toast can’t show/render but when I try to parse it, it crashes. Suggestions?
If I put ANSI in NotePad it works (integer parsing) and there are no weird chars as in the picture above, but of course my special characters aren’t working.
It’s the output console which doesn’t support those characters. Since you’re using Eclipse, you need to ensure that it’s configured to use UTF-8 for this. You can do this by Window > Preferences > General > Workspace > Text File Encoding > set to UTF-8.
See also:
Update as per the updated question and the comments, apparently the UTF-8 BOM is the culprit. Notepad by default adds the UTF-8 BOM on save. It look like that the JRE on your HTC doesn’t swallow that. You may want to consider to use the
UnicodeReaderexample as outlined in this answer instead ofInputStreamReaderin your code. It autodetects and skips the BOM.Unrelated to the actual problem, it’s a good practice to close resources in
finallyblock so that you ensure that they will be closed in case of exceptions.Also unrelated, I’d suggest to put
Pattern p = Pattern.compile(",");outside the loop, or even make it a static constant, because it’s relatively expensive to compile it and it’s unnecessary to do this everytime inside a loop.