.NET’s XmlTextWriter creates invalid xml files.
In XML, some control characters are allowed, like ‘horizontal tab’ (	), but others are not, like ‘vertical tab’ (). (See spec.)
I have a string which contains a UTF-8 control character that is not allowed in XML.
Although XmlTextWriter escapes the character, the resulting XML is ofcourse still invalid.
How can I make sure that XmlTextWriter never produces an illegal XML file?
Or, if it’s not possible to do this with XmlTextWriter, how can I strip the specific control characters that aren’t allowed in XML from a string?
Example code:
using (XmlTextWriter writer =
new XmlTextWriter("test.xml", Encoding.UTF8))
{
writer.WriteStartDocument();
writer.WriteStartElement("Test");
writer.WriteValue("hello \xb world");
writer.WriteEndElement();
writer.WriteEndDocument();
}
Output:
<?xml version="1.0" encoding="utf-8"?><Test>hello  world</Test>
This documentation of a behaviour is hidden in the documentation of the WriteString method but it sounds like it applies to the whole class.
So it seems that you end up writing invalid characters because you are using the XmlTextWriter class. A better solution for you would be to use the XmlWriter Class instead.