I want to do some screen scraping, ideally using CSS selectors and not XPath. Is there a library similar to ones in Ruby or Python?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
There are dozen of screen scraping library written in Java. Just to cite a few :
And many more at HTML Screen Scraping Tools written in Java. But these are IMO the best to deal with any kind of content (understand all kind of crap) as I mentioned in this previous answer. This might not be an issue for you though.
Just in case, maybe check out the thread Nokogiri pure Java status.
Update: A new project has been released (the 2010-01-31), jsoup, which offers a selector-syntax to find elements. See its website for more details and/or this answer from its author.