I read a UTF-8 file by:
br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), Charset.forName("UTF-8")));
I would like to know what’s the charset of returned String after I invoke br.readLine()?
Eclipse on my Computer uses “GBK” as default charset.
Technically, the file is been read using a charset of UTF-8 as you told the
InputStreamReaderto do so. The underlying bytes of the file content are been interpreted using UTF-8. ThereadLine()method returns aStringwhich stores the characters internally in Java’s own UTF-16 charset.What happens thereafter is fully dependent on what you’re doing with this
String. If you’re writing it back to a file using aWriterwithout specifying the charset, then the platform’s default will be used. If you’re displaying it to the stdout, then the stdout’s default charset will be used which is dependent on the runtime environment (command console? IDE? etc). If you’re saving it in a database, then it’s dependent on the JDBC driver configuration and/or the DB table encoding. Etcetera.Apparently you’re printing it to stdout in Eclipse’s console by
System.out.println(). In that case, the GBK charset will be used to display the characters. That would malform any originally read UTF-8 characters which are not covered by GBK. You’d need to configure Eclipse to use UTF-8 as text file encoding. That can be done by Window > Preferences > General > Workspace > Text file encoding.