I have some code which returns InnerXML for a XMLNode.
The node can contain just some text (with HTML) or XML.
For example:
<XMLNode> Here is some <strong>HTML</strong> <XMLNode>
or
<XMLNode> <XMLContent>Here is some content</XMLContnet> </XMLNode>
if I get the InnerXML for <XmlNode> the HTML tags are returned as XML entities.
I cannot use InnerText because I need to be able to get the XML contents. So all I really need is a way to un-escape the HTML tags, because I can detect if it’s XML or not and act accordingly.
I guess I could use HTMLDecode, but will this decode all the XML encoded entities?
Update: I guess I’m rambling a bit above so here is a clarified scenario:
I have a XML document that looks like this:
<content id='1'> <data><p>A Test</p></data> </content id='2'> <content> <data> <dataitem>A test</dataitem> </data> </content>
If I do:
XmlNode xn1 = document.SelectSingleNode('/content[@id=1]/data'); XmlNode xn2 = document.SelectSingleNode('/content[@id=2]/data'); Console.WriteLine(xn1.InnerXml); Console.WriteLine(xn2.InnerXml);
xn1 will return
<p>A Test</p>
xn2 will return <dataitem>A test</dataitem>
I am already checking to see if what is returned is XML (in the case of xn2) so all I need to do is un-escape the < etc in xn1.
HTMLDecode does this, but I’m not sure it would work for everything. So the question remains would HTMLDecode handle all the possible entities or is there a class somewhere that will do it for me.
I think Tomalak is on the right track, but I’d write the code a little differently:
This code makes a lot of your implicit assumptions explicit, and when you encounter data that’s not in the form you expect, it will tell you why it failed.