Here is an example of the HTML I need to parse into a PHP program:
<div id="dump-list">
<div class="dump-row">
<div class="dump-location odd" data-jmapping="{id: 35, point: {lng: -73.00898601, lat: 41.71727402}, category: 'office'}">
<div class="SingleLinkNoTx">
<a href="#10" class="loc-link">Acme Software</a><br/><strong>John Doe, MBA</strong><br/>123 Main St.<br />New York, NY 10036<br /><strong class="telephone">(212) 555-1234</strong><br/>
</div><!-- END.SingleLinkNoTx -->
<a href="http://www.example.com" target="_blank" class="web_link">Visit Website</a><span><br />(0.3 miles)</span>
<div class="loc-info">
<div class="loc-info-text ">
John Doe, MBA<br /><a href="http://maps.google.com/?daddr=41.71727402,-73.00898601" target="_blank">Get Directions »</a>
</div>
</div>
</div>
This is the information I want to extract from the above HTML example into PHP:
lng: -73.00898601, lat: 41.71727402
category: 'office'
Acme Software
John Doe, MBA
123 Main St.
New York, NY 10036
(212) 555-1234
http://www.example.com
I have tried using PHP Simple HTML DOM Parser, but I’m new to it and can’t find a working PHP example that pertains to what I need to do. I tried some PHP code like this to understand how this works, but the var_dump($e) produces huge amounts of output and has messages in the var_dump about recursion. So I’m lost how to really use this. Greatly appreciate some kind help!
$e = $html->find('.dump-location', 0)->find('.SingleLinkNoTx', 0);
echo $e;
var_dump($e);
Use XPath to find and extract elements in an HTML/XML document – specifically the SimpleXMLElement::xpath method.
The following example will find the telephone number for a location:
The most complex part is the XPath expression. A quick breakdown:
//*[contains(@class, "dump-location")]dump-locationclass/dump-locationparent.div[@class="SingleLinkNoTx"]DIVelement that has aSingleLinkNoTxclass (and no other class name).strongSTRONGtags with atelephoneclass.Using this XPath expression on the HTML snippet provided in the question will result in output like the following. Which is fairly easy to iterate and extract information from:
If you know the document structure it’s possible to construct an XPath expression for each piece of information you want to extract. Or, it might be simpler to use a more general XPath expression (say, an expression that retrieves all
dump-locationelements) and manually iterate though the elements.