I am having difficulty outputting data in UTF-8 format. I have a test case set up where data I am reading from an input file contains a British pound symbol (Hex C2A3). When I write it out on Linux, I get valid UTF-8 (C2A3). On windows, I only get HEX A3.
I tried using a PrintStream and specifying the character set as “UTF-8”. No luck. I tried many other streams with no luck until I finally tried a DataOutputStream. I used the “write()” method which took a byte array as a parameter. I needed to output a string, so I called “myString.getBytes(“UTF-8″)”.
I ended up with code like:
dataOutputStream.write(myString.getBytes(“UTF-8”));
This works properly on both systems; Windows 7 and Linux.
I am trying to understand why this worked and convince myself my solution is correct. Does it come down to system Locale’s? Linux defaults to en_US.utf-8. While all I could specify in Windows was just “en_US”. So when the outputstream attempted to get data from the string, the string was sending its data based upon the locale?
Or are you using FileOutputStream and there it matters the character encoding or DataOutputStream where you write binary. You should do a research too, but look at here please