I would like to know what is the best eopen-source library for crawling and analyzing websites. One example would be a crawler property agencies, where I would like to grab information from a number of sites and aggregate them into my own site. For this I need to crawl the sites and extract the property ads.
Share
I do a lot of scraping, using excellent python packages urllib2, mechanize and BeautifulSoup.
I also suggest to look at lxml and Scrapy, though I don’t use them currently (still planning to try out scrapy).
Perl language also has great facilities for scraping.