I’m loading a string containing some html into an XmlDocument class, in order to

Question

0

Asked: June 13, 20262026-06-13T23:10:28+00:00 2026-06-13T23:10:28+00:00

I’m loading a string containing some html into an XmlDocument class, in order to

0

I’m loading a string containing some html into an XmlDocument class, in order to do some manipulation on it, before converting it back into a string again.

The following code demonstrates what I’m doing;

    // Example of the HTML I am working with
    var documentTypeDeclaration = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">";
    var html = documentTypeDeclaration + "<html><body><div>&#163;300&#160;&#169;</div></body></html>";

    // Load the HTML into an XmlDocument
    var xmlDocument = new XmlDocument();
    xmlDocument.XmlResolver = null;
    xmlDocument.LoadXml(html);

    // Manipulate the HTML...

    // Get the HTML back out
    var savedHtml = xmlDocument.OuterXml;
    Console.WriteLine(html);
    Console.WriteLine(savedHtml);

I would expect the two lines output to the Console to match, but instead I get this-

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html><body><div>&#163;300&#160;&#169;</div></body></html>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"[]><html><body><div>£300 ©</div></body></html>

So it looks like [] has been added to the doc type declaration, and all the HTML character classes have been converted to their actual characters. This is particularly annoying as the HTML is now no longer standards compliant.

Does anyone know how I can stop the XmlDocument class from doing this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T23:10:28+00:00

Editorial Team

2026-06-13T23:10:28+00:00Added an answer on June 13, 2026 at 11:10 pm

Does anyone know how I can stop the XmlDocument class from doing this?

No, but I would use a real html parser instead of XmlParser

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

 // Manipulate the HTML...

StringWriter wr = new StringWriter();
doc.Save(wr);
string html2 = wr.ToString();

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m loading a string containing some html into an XmlDocument class, in order to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply