I am working on a project that requires me to collect a large list of URLs to websites about certain topics. I would like to write a script that will use google to search specific terms, then save the URLs from the results to a file. How would I go about doing this? I have used a module called xgoogle, but it always returned no results.
I am using Python 2.6 on Windows 7.
google has an API library. I’d recommend you use that: http://code.google.com/apis/ajaxsearch/
it’s a restful API, which means its easy to grab results via python/js. You’re limited to 32 results, I think, but that should be enough. it’ll return a nice structured object that you’ll be able to work with without having to do anything with html parsing.
if you wanted to ‘crawl’, you could then use urllib to grab each of those urls and get THEIR contents, and the urls they refer to, and on and on.