I have this HTML/XML:
\t\t\t\t\t \r\n\t\t
<a href="/test.aspx">
<span class=test>
<b>blabla</b>
</span>
</a>
<br/>
this is the text I want
<br/>
<span class="test">
<b>code: 123</b>
</span>
<br/>
<span class="test"></span>
\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t
In C#4 I use the HtmlAgilityPack lib to select the Node with XPath and get the InnerText property. This will get all the text inside the node. How can I get only the text “this is the text I want”?
/text() only returns \t\t\t\t\t \r\n\t\t
From the example given, this XPath will get you all text nodes underneath the div element, in this case test2.
If you could elaborate more on the question we might better be able to help you. The Div contains 3 children: a span element, a text node and a b element. The span and b each have a text node child. Using XPath you could select elements only (/div/*), text nodes only (/div/text()) or all node types (/div/node()).
EDIT: /text() will only return you root level text nodes. In this case I would expect it to return a node list containing 3 text nodes:
Are you perhaps only selecting the first node in the resultant node list?
There are a few issues of well-formedness such as your
<br>should probably be<br/>.