Does anybody know of any program where I can enter a domain name, and the program will crawl the entire domain and be able to download all of the HTML source code for me – so if the site has links on the page, it will crawl only to the pages on the domain name, not to external domain names obviously.
Share
Look at scrapy for python:
http://www.scrapy.org
or crawler4j for java:
http://code.google.com/p/crawler4j/