I have run into a dilemma. In a particular application, I’m receiving XML results from a SOAP request that look like this:
<env:Envelope xmlns:env='http://schemas.xmlsoap.org/soap/envelope/'>
<env:Header />
<env:Body>
<ns1:searchResponse xmlns:ns1='http://url.to.namespace' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<ns1:result><?xml version="1.0"?><results count="201" returned="201" code="200" msg="successful"><result order="0"><dirkey>DK886shn3525</dirkey><eid>smith</eid><email>smith@me.edu</email><fn>Smith</fn><ln>Bob</ln><wid>859589157</wid><score>70</score></result><result order="1"><dirkey>DK547fjx6702</dirkey><eid>james31</eid><email>ta@me.edu</email><fn>Tim</fn><ln>Allen</ln><stu><lvl>Senior</lvl><plans><plan>Technology Management-B</plan></plans><contacts><contact type="permanent"><city>Salina</city><phone>(123) 456-7890</phone><postal>67401</postal><street1>1111 Main Ln</street1><state>KS</state></contact></contacts></stu><wid>2222222222</wid><score>20</score></result></ns1:result>
</ns1:searchResponse>
</env:Body>
</env:Envelope>
I am most interested in the data contained within the <ns1:result> element. While this might make sense in an HTML world, I need the <ns1:result> text as XML. Intrigued by the possibility of doing this via XSL, I constructed the following stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:ns1="http://url.to.namespace"
exclude-result-prefixes="env ns1">
<xsl:output omit-xml-declaration="yes" indent="yes" method="text" />
<xsl:strip-space elements="*"/>
<!-- Template #1 - Identity Transform -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!-- Template #2 - for all text() nodes, disable output escaping -->
<xsl:template match="text()">
<xsl:copy-of select="." disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
…which technically does produce what I want:
<?xml version="1.0"?>
<results count="201" returned="201" code="200" msg="successful">
<result order="0">
<dirkey>DK886shn3525</dirkey>
<eid>smith</eid>
<email>smith@me.edu</email>
<fn>Bob</fn>
<ln>Smith</ln>
<wid>859589157</wid>
<score>70</score>
</result>
<result order="1">
<dirkey>DK547fjx6702</dirkey>
<eid>ta</eid>
<email>ta@me.edu</email>
<fn>Tim</fn>
<ln>Allen</ln>
<stu>
<lvl>Senior</lvl>
<plans>
<plan>Technology Management-B</plan>
</plans>
<contacts>
<contact type="permanent">
<city>Salina</city>
<phone>(123) 456-7890</phone>
<postal>67401</postal>
<street1>1111 Main Ln</street1>
<state>KS</state>
</contact>
</contacts>
</stu>
<wid>2222222222</wid>
<score>20</score>
</result>
</results>
However, I’ve heard it said that DOE is the sign of a desperate individual. Indeed, when I try to run this XSLT through an application of ours (one that is designed to transform XML before passing it on to a templating engine), it doesn’t work. I’m guessing that DOE is not implemented in our particular XSL parser…
So, here’s the ultimate question: is there a way in XSLT 1.0 to unescape these entities without using a parser-specific tactic like DOE? My one thought is constructing a method that translates certain escaped characters (e.g., >) into their literal counterparts (>)…but I’m not entirely sure how I’d go about that.
As always, I appreciate your assistance.
P.S. Please, don’t bother telling me how disgusting this output is or how they’ve mangled their document structure; we’ve already tried to get them to change it and that’s not an option. 🙁
There isn’t a pure XSLT way to reconstruct destroyed markup — until XSLT 3.0 (still a W3C working draft) that will xave a standard function
parse-xml()Until you have XSLT 3.0 available, the safe way to reconstruct destroyed markup is to call an extension function with a similar signature that you have to write yourself.
This extension function will try to parse its string argument into an instance of
XmlDocumentand if successful, return back the result.