This is the problem: The script I use stops looking at the first tag.
I’m sceaping a website, and this is the part of the site I want to ‘extract’.
<div class="i-want-this-div">
<div class="annoying-sub-div">
Bla bla bla
</div>
<div class="annoying-sub-div">
etc...
</div>
<div class="annoying-sub-div">
</div>
<div class="annoying-sub-div">
</div>
<div class="annoying-sub-div">
</div>
</div>
I want to display all those ‘annoying'(because they mess up the function of the script by being there) divs on my site, but how do I do this?
This is my current approach: get the position of the first tag, get the position of the closing tag and subtract that part form the entire string that holds the whole website source.
$startPos = strpos($siteIAmScreaping, '<div class="i-want-this-div">');
$endPos = strpos($siteIAmScreaping, '</div>', $startPos) + 8;
$annoyingDivs = substr($siteIAmScreaping, $startPos, $endPos-$startPos);
The problem is: I want it to stop on the main divs closing tag and not on the first closing tag it finds.
Use querypath (or phpquery) for simplicity. You can then extract the
<div>content by class or id most easily: