My application is set up to support storing UTF-8 character encodings. I am reading files that I get from various other organizations which might be in UTF-8, latin-1, ASCII, etc. Do I need to do anything special to ensure that the files which have various character encodings are read into UTF-8 format correctly? e.g. do I need to figure out what character encoding the file is in and explicitly convert it to UTF-8?
Or is the following sufficient?
Reader reader = new InputStreamReader(new FileInputStream("c:/file.txt"), "UTF-8");
You have that wrong. You don’t read into an encoding, you read from encoding. The encoding you provide as the second argument to
InputStreamReadershould be the expected encoding of the source stream(file).Once the data is in memory, it is always UTF-16. When you want to write the data (assuming you always want to write it as UTF-8), then you will use: