I have an XML page with some elements in various languages – Arabic, English, Chinese, Japanese.. Which encoding format should I have to choose for that? If I try to render the XML with an XSL (using utf-8 or ISO-8859-6 or ISO-2022-JP), I get this error:
An invalid character was found in text
content.
How shall or solve this?
Thanks.
UTF-8 is the only encoding that can handle all those alphabets. It’s also the default encoding for XML, and the only encoding that makes sense for a modern application. (For storage/on-the-wire, anyway; for internal processing your language’s string type would be more likely to be UTF-16 or 32.)
It would seem from the error that you have a problem in the input file, rather than an issue with your choice of output encoding. Maybe it’s encoded in something other than UTF-8 but has forgotten to include an
<?xml encoding?>declaration to say so. Or maybe there’s an invalid ISO-2202-JP escape sequence? (This is a horror of an encoding.)You should try to load the input file into something that parses XML (eg. Firefox or IE) and see what errors, if any, it comes up with.
(You can’t mix encodings in a single XML file. If you’ve spat out bytes strings from different sources into XML, you’ve already lost. How is this XML generated?)