I am trying to get the text from a specific node’s parent. For example:
<td colspan="1" rowspan="1">
<span>
<a class="info" shape="rect"
rel="empLinkData" href="/employee.htm?id=8468524">
Jack Johnson
</a>
</span>
(*)
</td>
I am able to successfully process the anchor tag by using:
$xNodes = $xpath->query('//a[@class="info"][@rel="empLinkData"]');
// $xNodes contains employee ids and names
foreach ($xNodes as $xNode)
{
$sLinktext = @$xNode->firstChild->data;
$sLinkurl = 'http://www.company.com' . $xNode->getAttribute('href');
if ($sLinktext != '' && $sLinkurl != '')
{
echo '<li><a href="' . $sLinkurl . '">' .
$sLinktext . '</a></li>';
}
}
Now, I need to retrieve the text from the <td> tag (in this case, the (*) appearing right after the span tag closes), but I can’t seem to refer to it properly.
The xpath for this that seems to make the most sense to me is:
$xNodes = $xpath->query('//a[@class="info"]
[@rel="empLinkData"]/ancestor::*');
but it is retrieving the wrong data from elsewhere nested above this code.
It’s not necessary to retreat back up the tree. Instead, directly select the
tdthat contains the relevant element:Edit: As @Dimitre rightly pointed out, this selects all text children. Your
tdhas two such nodes: the whitespace-only text node that precedes thespanand the text node that follows it. If you only want the second text node, then use:Or:
As you can see, the resulting expressions are essentially the same, but you do need to target the correct text node (if you want only one). Note also that if the target text is truly in a
tdthen it’s safer to target that element type directly (without wildcards). As this is HTML, your actual document almost certainly contains several other elements, including multiple other anchors that you may not want to target.Sample PHP: