When loading XML into an XmlDocument, i.e.
XmlDocument document = new XmlDocument(); document.LoadXml(xmlData);
is there any way to stop the process from replacing entities? I’ve got a strange problem where I’ve got a TM symbol (stored as the entity #8482) in the xml being converted into the TM character. As far as I’m concerned this shouldn’t happen as the XML document has the encoding ISO-8859-1 (which doesn’t have the TM symbol)
Thanks
This is a standard misunderstanding of the XML toolset. The whole business with ‘&#x’, is a syntactic feature designed to cope with character encodings. Your XmlDocument isn’t a stream of characters – it has been freed of character encoding issues – instead it contains an abstract model of XML type data. Words for this include DOM and InfoSet, I’m not sure exactly which is accurate.
The ‘&#x’ gubbins won’t exist in this model because the whole issue is irrelevant, it will return – if appropriate – when you transform the Info Set back into a character stream in some specific encoding.
This misunderstanding is sufficiently common to have made it into academic literature as part of a collection of similar quirks. Take a look at ‘Xml Fever’ at this location: http://doi.acm.org/10.1145/1364782.1364795