I have a Unicode (UTF-8 without BOM) text file within a jar, that’s loaded as a resource.
URL resource = MyClass.class.getResource("datafile.csv");
InputStream stream = resource.openStream();
BufferedReader reader = new BufferedReader(
new InputStreamReader(stream, Charset.forName("UTF-8")));
This works fine on Windows, but on Linux it appear not to be reading the file correctly – accented characters are coming out broken. I’m aware that different machines can have different default charsets, but I’m giving it the correct charset. Why would it not be using it?
The reading part looks correct, I use that all the time on Linux.
I suspect you used default encoding somewhere when you export the text to the web page. Due to the different default encoding on Linux and Windows, you saw different result.
For example, you use default encoding if you do anything like this in servlet,
You need to specifically write in UTF-8 like this,