I have been Googling for sometime but I guess I am using the wrong set of keywords. Does anyone know this URI that lets me request permission from Facebook to let me crawl their network? Last time I was using Python to do this, someone suggested that I look at it but I couldn’t find that post either.
Share
Amazingly enough, that’s given in their robots.txt.
The link you’re looking for is this one:
http://www.facebook.com/apps/site_scraping_tos.php
If you’re not a huge organization already, don’t expect to be explicitly whitelisted there. If you’re not explicitly whitelisted, you’re not allowed to crawl at all, according to the robots.txt and the TOS. You must use the API instead.
Don’t even think about pretending to be one of the whitelisted crawlers. Facebook filters by whitelisted IP for each crawler and anything else that looks at all like crawling gets an instant perma-ban. For a while users who simply clicked too fast could occasionally run into this.