I have an HTML document that I would like to query using C# and XPath. What I am searching for is an XPath expression – not XSLT, C#, PHP or any other language-specific code samples. Any help will be highly appreciated but the XPath expression is all I need :).
<tr>
<td>
<p>
<span>text</span>
</p>
</td>
<td>
<p>
<span>text</span>
</p>
</td>
</tr>
<tr>
<td>
<p>
<span>This text is static and will never change</span>
</p>
</td>
<td>
<p>
<span>Bla bla bla .... more bla bla bla</span>
</p>
</td>
</tr>
<tr>
<td>
<p>
<span>text</span>
</p>
</td>
<td>
<p>
<span>text</span>
</p>
</td>
</tr>
The XPath expression that I am looking for will extract the text that is currently represented by the string instance “Bla bla bla …. more bla bla bla”. This text will vary from HTML document to HTML document but one string is ALWAYS the same. In this case that string is represented as “This text is static and will never change”.
“This text is static and will never change” and “Bla bla bla …. more bla bla bla” are of course not the true strings – i replaced them because they are domain specific, not relevant to the problem and they reveal sensitive data that must not be shown!
Again, any help will be highly appreciated. Thanks.
Use:
When this XPath expression is evaluated against the following XML document (obtained by turning the provided malformed HTML into a wellformed XML document):
the text node with value
"text to extract"is selected, as required.XSLT – based verification:
when this transformation is applied on the same XML document (above), the XPath expression is evaluated and the result of this evaluation is copied to the output:
Alternatively, if you know the text but want to select an element containing it (say
td), then use:Again with XSLT-based verification:
The result now is:
Still another guess:
If you want to find the closest preceding text node, then use:
XSLT – based verification:
Result:
Update:
After the latest update by the OP, and his new explanation, the XPath expression he is looking for is:
This selects the text node with string value: