I have an XML document that’s being generated from some content that people are copy/pasting from all sorts of places (Word documents mostly though).
It looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<data> <![CDATA[
(whatever was pasted)
]]></data>
</response>
I’ve always used an encoding of UTF-8 or iso-8859-1, but now someone’s gone and copy/pasted the unicode character U+001A (0x1a) and I can’t find an encoding that will accept it. Everything I put the XML file into (e.g. Firefox, Internet Explorer, XML Spy) all say it’s invalid, regardless of the kind of encoding used.
Is there an encoding I can use that will stop the file from falling over, or do I need to start stripping all these characters out one by one?
U+001A is not a valid character in an XML document. The valid range of characters according to the specification is: