I am after a specific value from a webapge; the product name that is in the h1 tag:
<div id="extendinfo_container">
<a href="/someproduct.html"><h1><strong>Product Name</strong></h1></a>
<div style="font-size:0;height:4px;"></div>
<p class="text_breadcrumbs">
<a href="/Our-Brands.html" target="_self"><img src="arrow_091.gif" align="absmiddle"/></a>
<a href="/someproduct.html" target="_self" class="link_breadcrumbs">Product Name</a><img src="arrow_091.gif" align="absmiddle"/>
<strong>Product Name</strong>
<div class="dotted_line_blue">
<img src="theme_shim.gif" height="1" width="100%" alt=" " />
</div>
</div>
This is a poorly structured website with more than one h1 so I cannot simply do getElementById(‘h1’).
I want to be as specific as possible in which element I get and this is the code I have:
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents('http://url/to/website'));
// locate <div id="extendinfo_container"><a><h1><strong>(.*)</strong></h1></a> as product name
$x = new DOMXPath($doc);
$pName = $x->query('//div[@id="extendinfo_container"]/a/h1/strong');
var_dump($pName->nodeValue);
This is return null. What query do I need to use to get the content I want?
query()returns aDOMNodeList, which doesn’t have anodeValueproperty. You have to select one element (i.e. the first):Or iterate over it:
Either one of these will give you access to a
DOMNode, which is what you’re looking for.