given the following div element
<div class="info">
<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="b">456</span>
<span class="c">789</span>
</div>
I want to retrieve contents of the span with class “b”. However, some divs I want to parse lack the second two spans (of class “b” and “c”). For these divs, I want the contents of the span with class “a”. Is it possible to create a single XPath expression that selects this?
If it is not possible, is it possible to create a selector that retrieves the entire contents of the div? ie retrieves
<a href="/s/xyz.html" class="title">title</a>
<span class="a">123</span>
<span class="b">456</span>
<span class="c">789</span>
If I can do that, I can use a regex to find the data I want. (I can select the text within the div, but I’m not sure how to select the tags also. Just the text yields 123456789.)
The xpath expression should be something like:
The expression left of the union operator
|will select you all the b-class spans inside all divs, the expression on the right hand side will first query all divs that do not have a b-class span and then select their a-class span. The | operator combines the results of the two sets.See here for selecting nodes with not() and here for combining results with the | operator.
Also, to refer to the second part of your question have a look here.
Using node() in your xpath you can select everything (nodes + text) that is below the node selected. So you can get everything in the div returned by
for future processing by other means.