I need to write a scraper in Java + Groovy..
I was wondering if something able to parse HTML documents and select the informations I need through simple CSS selectors (instead that going through the whole document tree and manually select what I need) exists? Something like Nokogiri for Ruby, just to give you the idea of what I need..
thanks in advance!
I do something like this by loading a page with Qt Webkit and including JQuery.
It’s a hack but works well for my use case. I needed a solution that requires no configuration – just sudo apt-get install libqt4-webkit and you’re ready to go.