I’m writing a crawler with wombat. And somehow i’m using CSS selectors, not XPATH. And i have very difficult selection here – that i can’t achieve using css.
I have div elements that i want to grab from a page:
<div class="do_cat_ads_box"> ... </div>
<div class="do_cat_ads_box2"> ... </div>
<div class="do_cat_ads_box" style=".."> ...</div>
<div class="do_cat_ads_box2" style=".."> ... </div>
But elements with ‘style’ attribute – are garbage (ads) that i don’t need.
So my question is, can I grab all div elements with classes ‘do_cat_ads_box’ and ‘do_cat_ads_box2’, but avoid div elements that have ‘style’ attribute?
I ended up with something like this and it is not working:
application 'css=div.do_cat_ads_box2, div.do_cat_ads_box, div.do_cat_ads_box:not(@style)', :iterator do
href 'css=div.do_cat_ads_image a @href'
name 'css=div.do_cat_ads_detail a'
end
if it’s not double with css selectors, then there is always xpath way. But i’m very interested in css-selectors approach.
Attribute selectors in CSS use
[attr]notation. The@attrnotation pertains to attribute locators (as well as XPath).Assuming Wombat supports the CSS syntax for attribute selectors, try changing
:not(@style)to:not([style])and rewriting your class selectors to the following: