Given an element as context I want to select the preceding sibling element and check to see if it has a particular name. The caveat is that I do not want to select it if there is an intervening text node that has non-whitespace content.
For example, given this XML document…
<r>
<a>a1</a><a>a2</a>
b
<a>a3</a>
<a>a4</a>
<b/>
<a>a5</a>
</r>
…then:
- For “a1” there should be no match (there is no
<a>sibling element immediately preceding it) - For “a2” then “a1” should be matched (there is no intervening text node)
- For “a3” there should be no match (there is an intervening text node with non-whitespace contents)
- For “a4” then “a3” should be matched (the intervening text node is only whitespace)
- For “a5” there should be no match (the preceding sibling element is not an
<a>).
I can check to see if the preceding sibling is an <a> with preceding-sibling::*[1][name()="a"]
However, I can’t figure out how to say “select the following sibling node, regardless of element or textness, and see if that’s not text or normalize-space(.)="". My best guess was this:
preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]
…but that appears to have no effect.
Here’s my test Ruby file:
require 'nokogiri'
xpath = 'preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]'
fragment = Nokogiri::XML.fragment '<a>a1</a><a>a2</a> b <a>a3</a> <a>a4</a> <b/> <a>a5</a>'
fragment.css('a').each{ |a| p [a.text,a.xpath(xpath).to_s] }
#=> ["a1", ""]
#=> ["a2", ""]
#=> ["a3", "<a>a2</a>"]
#=> ["a4", "<a>a3</a>"]
#=> ["a5", ""]
The result for “a2” and “a3” are what is wrong and confuses me. It finds the preceding <a> correctly, but then does not correctly verify that the first following-sibling of that is either not text (which should allow “a2” to find “a1”) or that it is whitespace only (which should prevent “a3” from finding “a2”.
Edit: Here’s the XPath I was writing, and what I intended it to do:
-
preceding-sibling::*[1][name()="a"]…– find the first preceding element, and ensure that it is an<a>. This appears to be working as desired.-
[following-sibling::node()[1][…]]– ensure that the first following node (of the found preceding<a>) matches some conditionsnot(text()) or normalize-space(.)=""– ensure that this following node is either not a text node, or that the normalized space of it is empty
-
Use:
XSLT – based verification:
When this transformation is applied on the provided XML document:
the XPath expression is evaluated and the nodes that are selected by this evaluation, are copied to the output:
Update:
What is wrong with the XPath expression in the question?
The problem is here:
This tests if the context node doesn’t have a text node child.
But the OP wants to test if the context node is a text node.
Solution:
Replace the above with:
XSLT – based verification:
Now this transformation produces exactly the wanted result: