Just curious: What do you find to be your best tools for creating automated screen scrapes these days? is the .Net Agility pack a good option? What do you do about scraping sites that use a lot of AJAX?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
I find that if the page has a pretty static layout, then the HTML Agility Pack is perfect for getting all the data I need. I’ve not run into a single page that it hasn’t been able to handle and not get me the results I wanted.
If you find that the page is rendered with a great deal of dynamic code, you’re going to have to do more than just download the page, you’ll have to actually execute it.
To do that, you’ll need something like the WebKit .NET library (a .NET wrapper around the WebKit rendering engine) which will allow you to download the page and actually execute Javascript as well. Then, once you are sure the document has been rendered completely, you can get the page details.