I am making a scraper which is scraping product price and i need to ignore like this site div class but it is changeable for all web sites so this is really problem for me. You can see here i scrape first element and it comes like this
1 - <div class="ProductPrice">
<span id="ctl00_ContentPlaceHolder1_Category1_ctrl_0_ctrl_7_mainGrid_ctl00_PUnit_lblPriceWithTax">47,00 TL</span>
</div>
Then i scrape second one this scrape again tag(tag names can be changeable so pls consider this before answer)
2 - <span id="ctl00_ContentPlaceHolder1_Category1_ctrl_0_ctrl_7_mainGrid_ctl00_PUnit_lblPriceWithTax">47,00 TL</span>
My code is :
Elements allElements = newDocument.getAllElements();
for (int j = 0; j < allElements.size(); j++) {
Element element = allElements.get(j);
if (element.text().matches(regex){
// Writing to console.
}
}
I solved this problem with controlling element has got a children ? then if it has children then check they match regex, if they not match just circulate childs to find acceptable element.