I would like a brief and easy way to strip tags from an XHTML document, and believe there has to be something curt enough among all the options like: XSLT, XPath, XQuery, custom C# programming using the .NET XML namespace. I’m open to others.
For example, I want to strip all
<b>tags from an XHTML document but
keep their inner content and child tags
(i.e. not simply skip the bold tag and
its children).
I need to maintain the structure of the original document minus the stripped tags.
Thoughts:
-
I’ve seen XSLT‘s ability to match elements for selection; however I want to match everything by default with a couple of exceptions, and I’m unsure it’s conducive to this. This is what I’m looking at right now.
-
XQuery I haven’t started to look into. (Update for XQuery: Took a brief look at this technology and it’s comparable enough to SQL in function that I fail to see how it can maintain the nested node structure of the original document – I think this is not a contender).
-
A custom C#/.NET XML namespace program might be viable as I already have an idea for it, but my immediate assumption is it’s likely more involved contrasted with the reasons for which these other XML-specific matching languages were created.
-
… another kind of enabling technology I haven’t yet considered…
Have you thought of XSLT? This is the language specifically designed for transforming XML and generally tree structures.
This transformation:
when applied on any XHTML document, as the one below:
produces the wanted, correct result, in this case: