i have a text file with WINDOWS-1252 characters like ø and ß. the file is being uploaded via form submit to a servlet, where it’s being parsed with opencsv and returned as a List object to a jsp page where it’s displayed.
the utf-8 chars are displayed as ? and i’m trying to figure out where along the way the encoding might have gone wrong.
i’ve tried a bunch of stuff:
-
my page has the tag
<%@page contentType="text/html" pageEncoding="WINDOWS-1252"%> -
file input is encoded –
new FileInputStream(file), "WINDOWS-1252") -
every string is encoded –
s = new String(s.getBytes("WINDOWS-1252"));
where else can the encoding fail? any ideas?
OK problem is fixed.
So the first problem was that it wasn’t a utf-8 file at all but a WINDOWS-1252 one. i determined that using the juniversalchardet lib (very helpful and easy-to-use).
Then i had to make sure that i’m reading the file with the right charset by using a FileInputStream:
the i just had to make sure that i am displaying it with the right charset in the jsp file using the tag
<%@page contentType="text/html" pageEncoding="WINDOWS-1252"%>that’s pretty much it-
(1) determine charset
(2) make sure you’re reading the file right
(3) make sure you display it right