Given an XML document like this:
<r>
<a/><b/><c/>
<d>
<d1/>
<d2>
<d2a/>
<d2b/>
<d2c/>
</d2>
</d>
<e/>
</r>
And given the criteria “Start at b, stop at d2b” is there an XPath expression that can select either:
Ideally:
<c/><d><d1/><d2><d2a/></d2></d>
Reasonably:
<c/>
I know that with the criteria “start at ‘a’ and end at ‘e'” I can use the expression //*[preceding-sibling::a][following-sibling::e]; I’m wondering if there’s a way to do some odd intersection of ancestor axes and preceding siblings to find a common ancestor when the start and end elements are not guaranteed to share the same parent.
XPath (both 1.0 and 2.0) is a query language for XML documents. As such it cannot alter the nodes and structure of any XML document.
The wanted result can be obtained via an XSLT transformation (I. XSLT 1.0 used below):
when this transformation is applied on the provided XML document:
the wanted, correct result, is produced:
Explanation:
The identity rule copies every matched node “as-is”.
There is a single overriding template matching any element.
Inside this template two tests are made: whether the current node belongs to the set of all elements “following the start” and whether the current node belongs to the set of all elements “preceding the end”. If so, the current node is passed to the identity template (copied), otherwise it is ignored (deleted).
II. XSLT 2.0 solution
When this transformation is applied on the XML file above, again the same correct result is produced.
Explanation: Use of the XPath 2.0 operator
intersect.III. XPath 1.0 solution, selecting just the nodes without altering the document:
For readability I am providing an XSLT transformation that outputs the result of selecting the wanted nodes. With the same purpose, sub-expressions are defined as variables:
When this transformation is applied to the provided XML document (above), the wanted, selected nodes are output:
Explanation: Here I use the Kayessian (by @Michael Kay) formula for the intersection of two node-sets
$ns1and$ns2:IV. Finally the Xpath 2.0 solution (corresponding to the XPath 1.0 solution):
I am again using an XSLT (2.0) transformation to copy the results to the output:
The same results (exactly the wanted nodes) as from the XPath 1.0 solution are produced:
UPDATE: Here is a XPath 1.0 solution for the “reasonably” question. Again it is expressed as XSLT stylesheet module, in which, for better readability, subexpressions are defined as separate variables:
When this transformation is applied on the following XML document (the same as the provided, but wrapped into one more top element, and two children (
gandh) added toc— to make it more interesting:the wanted, correct node-set is selected and copied to the output:
Explanation: This is almost the same as before, but we take as
$pEndits highest ancestor — that is an immediate child of the common ancestor of$pStartand$pEnd.