Using XPath to parse this link
$html = '<a href="/browse/product.do?cid=1&vid=1&pid=1" class="productItemName">what is going on here</a>';
$dom = new DOMDocument();
$dom->loadhtml($html);
$xpath = new DOMXPath($dom);
$selectors['link'] = '//a/@href';
$links_nodeList = $xpath->query($selectors['link']);
foreach ($links_nodeList as $link) {
$link->nodeValue = str_replace("http://www.test.com",'',$link->nodeValue); // relativize link
$links[] = $link->nodeValue;
}
echo("<p>links</p>");
echo("<pre>");
print_r($links);
echo("</pre>");
gives the result:
Warning: main() [function.main]: unterminated entity reference vid=1&pid=1 in C:\Users\dir\public_html\whatisgoingon.php on line 14
links
Array
(
[0] => /browse/product.do?cid=1
)
This line is causing the error and the truncation of the link. What is going on here?
$link->nodeValue = str_replace("http://www.test.com",'',$link->nodeValue);
The
&inside the URL must be being decoded when you referencenodeValue. Wrap it inhtmlentities()Outputs: