I have this query which extracts the posts which has been “liked” more than 5 times.
//div[@class="pin"]
[.//span[@class = "LikesCount"]
[substring-before(normalize-space(text())," ") > 5]
I’d like to extract and store additional informations like titles,img url,like number,repin number,…
How to extract them all ?
- Multiple XPath queries?
- Digging into the nodes of the resulted posts while iterating with php and php functions?
- …
Follows a Markup example:
<div class="pin">
<p class="description">gorgeous couch <a href="#">#modern</a></p>
[...]
<div class="PinHolder">
<a href="/pin/56787645270909880/" class="PinImage ImgLink">
<img src="http://media-cache-ec3.pinterest.com/upload/56787645270909880_d7AaHYHA_b.jpg"
alt="Krizia"
data-componenttype="MODAL_PIN"
class="PinImageImg"
style="height: 288px;">
</a>
</div>
<p class="stats colorless">
<span class="LikesCount">
22 likes
</span>
<span class="RepinsCount">
6 repins
</span>
</p>
[...]
</div>
As you are already using XPath in your code I would suggest to extract that information using XPath too. Here comes an example on how to extract the description.