I’m currently making a program that saves Chinese Words onto a text file. I create the text file in java, and then try and write words to it. However, the text file I create is never encoded in UTF-8. This is the code I’m using, why doesn’t it work? I was told that there was a bug inherent in Java but I have no idea how to get around it.
public void createFile(String name) {
try {
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(name +".txt"), "UTF-8"));
out.write("");
}
catch(java.io.IOException e) {
System.err.println("Something went wrong.");
}
}
Also, do I have another option aside from text files with which I could still use UTF encoding?
Also I’m testing its encoding by opening the TextEdit application and trying to write Chinese characters. Could this also be a problem?
First, files themselves don’t have encodings. They’re a bunch of 0s and 1s. If you write “asdf” in utf-8, it’s completely indistinguishable from plain old ascii7.
If you were writing in, say, utf-16, then the byte-order mark (BOM) would be a pretty clear indication that it’s written in utf-16, even with an empty string, but utf-8 does not require such a marker to be present.
Therefore, your editor has no way of knowing that this file is supposed to be written in utf-8. You could write utf-8’s BOM to your file by:
out.write(0xEFBBBF);
However, in this case,
outwould have to be an OutputStream, such as the FileOutputStream. (BufferedWriter and OutputStreamWriter do not accept byte arrays for input.)