I have a program that is generating Xml Files from data out of a database. In short code it does the following:
string dsn = "a db connection string";
XmlDocument d = new XmlDocument();
using (SqlConnection con = new SqlConnection(dsn)) {
con.Open();
string sql = "select id as Id, comment as Comment from Test where ... ";
using (SqlCommand cmd = new SqlCommand(sql, con)) {
DataSet ds = new DataSet("EXPORT");
SqlDataAdapter da = new SqlDataAdapter(cmd);
da.Fill(ds, "Test");
d.LoadXml(ds.GetXml());
}
}
d.Save(@"c:\test.xml");
When I have a look at the xml file it contains the invalid character & # x 1 A ;
<EXPORT>
<Test>
<Id>2</Id>
<Comment> Keyboard NB5 linked</Comment>
</Test>
</EXPORT>
This xml file cannot be opened by firefox browser saying invalid character …
That Entity is reserved in ISO 8859-1 and CP1252 and should not be rendered by browsers. But why does XmlDocument output xml that cannot be parsed as valid – or is it a valid xml document that just cannot be parsed by Browsers or imported by Excel and so on …
Is there a easy way of getting rid of that reserved ‘invalid characters’ or encoding them in a way that Browsers do not have a Problem with it?
Many thanks for your opinion and tipps
Not all characters are representable in XML.
In XML 1.0, none of the characters with values less than 0x20 can be used, except for TAB (0x09), LF (0x0A) and CR (0x0D).
In XML 1.1, just about anything except NUL (0x00) can be used.
If you have the option to use XML 1.1, and the receiving program supports XML 1.1 (not many do), then you can escape the 0x1A as
or.Wrapping it in
CDATAis not a solution either;CDATAis just a convenience for escaping groups of characters differently than the standard &-mechanism.Otherwise, you will need to remove it prior to serializing.