Hoping this is possible with Simple Html Dom, I’m scraping a page that looks like this:
<h5>this is title 1</h5>
<img>
<img>
<img>
<h5>this is title 2</h5>
<img>
<img>
<h5>this is title 3</h5>
<img>
<img>
<img>
<img>
etc…
I’m trying to get it to look something like:
<h5>this is title 1</h5>
<img>
<h5>this is title 1</h5>
<img>
<h5>this is title 1</h5>
<img>
<h5>this is title 2</h5>
<img>
<h5>this is title 2</h5>
<img>
Which means for each IMG I need to find and grab the first previous H5, I think. There’s no parent divs or any structure to make it any easier, it’s pretty much how I described it.
The code I’m using looks something like this (simplified):
foreach($html->find('img') as $image){
//do stuff to the img
$title = $html->find('h5')->prev_sibling();
echo $title; echo $image;}
Everything I’ve tried with prev_sibling gets me a “Fatal error: Call to a member function prev_sibling() on a non-object” and I’m wondering if what I’m trying to do is even possible with PHP Simple HTML Dom. I hope so, all the other scrapers I’ve tried were making me pull my hair out.
Essentially, you want to select all
h5elements, as well as all theimgelements. Then, you loop through them, and check their type. If it’s anh5element, you update your$titlevariable but don’techoanything. If it’s animg, you simply echo the$titlebefore the image. No need to go hunting for theh5now since you’ve already cached it.Here’s an example: