What about ^\d*[0-9](|.\d*[0-9]|,\d*[0-9])?$ that permits all decimal numbers. Edit: ^\d{1,5}$…

Question

0

Asked: May 12, 20262026-05-12T09:36:18+00:00 2026-05-12T09:36:18+00:00

I have a Unicode (UTF-8 without BOM) text file within a jar, that’s loaded

0

I have a Unicode (UTF-8 without BOM) text file within a jar, that’s loaded as a resource.

URL resource = MyClass.class.getResource("datafile.csv");
InputStream stream = resource.openStream();
BufferedReader reader = new BufferedReader(
    new InputStreamReader(stream, Charset.forName("UTF-8")));

This works fine on Windows, but on Linux it appear not to be reading the file correctly – accented characters are coming out broken. I’m aware that different machines can have different default charsets, but I’m giving it the correct charset. Why would it not be using it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T09:36:18+00:00

The reading part looks correct, I use that all the time on Linux.

I suspect you used default encoding somewhere when you export the text to the web page. Due to the different default encoding on Linux and Windows, you saw different result.

For example, you use default encoding if you do anything like this in servlet,

PrintWriter out = response.getWriter();
out.println(text);

You need to specifically write in UTF-8 like this,

 response.setContentType("text/html; charset=UTF-8");
 out = new PrintWriter(
    new OutputStreamWriter(response.getOutputStream(), "UTF-8"), true);
 out.println(text);

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions