I’ve written a little application that does some text manipulation and writes the output to a file (html, csv, docx, xml) and this all appears to work fine on Mac OS X. On windows however I seem to get character encoding problems and a lot of ”’ seems to disappear and be replaced with some weird stuff. Usually the closing ”’ out of a pair.
I use a FreeMarker to create my output files and there is a byte[] array and in one case also a ByteArrayStream between reading the templates and writing the output. I assume this is a character encoding problem so if someone could give me advise or point me to some ‘Best Practice’ resource for dealing with character encoding in java.
Thanks
There’s really only one best practice: be aware that Strings and bytes are two fundamentally different things, and that whenever you convert between them, you are using a character encoding (either implicitly or explicitly), which you need to pay attention to.
Typical problematic spots in the Java API are:
new String(byte[])String.getBytes()FileReader, FileWriterAll of these implicitly use the platform default encoding, which depends on the OS and the user’s locale settings. Usually, it’s a good idea to avoid this and explicitly declare an encoding in the above cases (which FileReader/Writer unfortunately don’t allow, so you have to use an InputStreamReader/Writer).
However, your problems with the quotation marks and your use of a template engine may have a much simpler explanation. What program are you using to write your templates? It sounds like it’s one that inserts ‘smart quotes’, which are part of the Windows-specific cp1251 encoding but don’t exist in the more global ISO-8859-1 encoding.
What you probably need to do is to be aware which encoding your templates are saved in, and configure your template engine to use that encoding when reading in the templates. Also be aware that some texxt files, specifically XML, explicitly declare the encoding in a header, and if that header disagrees with the actual encoding used by the file, you’ll invariable run into problems.