Is there some good tutorial or sample to learn about http web scraping? How to start developing a tool that can search on some web sites and download specific information so I can collect it automatically and then analyse?? thanks!
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
A tool commonly recommended for this is the Html Agility Pack. It will take malformed HTML and massage it into XHTML and then a traversable DOM, so is very useful for the code you find in the wild, as opposed to approaches like RegEx, which are destined to break.
There are some examples and the API documentation here:
http://html-agility-pack.net/api
Some useful links: