I need to scrape the number 622104 from this html
How can I get the number?
<div class="numbersBackground">
<div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl00_numberPanel" class="number">
<div class="numberWrapper"><span>6</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl01_numberPanel" class="number">
<div class="numberWrapper"><span>2</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl02_numberPanel" class="number">
<div class="numberWrapper"><span>2</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl03_commaPanel" class="comma">
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl04_numberPanel" class="number">
<div class="numberWrapper"><span>1</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl05_numberPanel" class="number">
<div class="numberWrapper"><span>0</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl06_numberPanel" class="number">
<div class="numberWrapper"><span>4</span></div>
</div>
</div>
Using the
DOMDocumentclass to parse the HTML string, thanks to itsloadHTMLmethod, you could use an XPath query (using theDOMXpathclass) to find all<div>tag with aclass="numberWrapper"attribute.Then, iterate over those, concatenating their content to a variable — which, at the end of the loop, will contain your number.
For example, you could have this kind of code :
And, as output, you’d get :
You could also use the following XPath query, to make sure you’re only working with the
<span>tags :Here, as the
<div>s only contain the<span>, the result will be the same — but it might change, in other situations.Of course (just to make sure it’s said) : Regular Expressions are not the right way to extract informations from an HTML string.
Edit after the comment :
If there are other
<div>s you don’t want to take into account, you’ll have to find another XPath query — that matches what you want to extract.For example, maybe something like this would do the trick :
Of course, up to you to find out exactly what matches your the structure of your HTML document.
If you want to download the HTML, you have two solutions :
allow_url_fopenis enabled on your server, you can useDOMDocument::loadHTMLFile(), passing it the URL as a parameter.As a sidenote, if you get warnings before your HTML is not valid, you’ll want to take a look at the
libxml_use_internal_errors()function 😉