I had this code that help me fetch the URL of an actor page on IMDB by searching “IMDB+Actor name” and givng me the URL to his IMDB profile page.
It worked fine till 5 minutes ago and all of a sudden it stopped working. Do we have a daily limit for google queries (would find it very strange!) or did I alter something on my code without noticing (in this case can you spot what’s wrong?) ?
function getIMDbUrlFromGoogle($title){
$url = "http://www.google.com/search?q=imdb+" . rawurlencode($title);
echo $url;
$html = $this->geturl($url);
$urls = $this->match_all('/<a href="(http:\/\/www.imdb.com\/name\/nm.*?)".*?>.*?<\/a>/ms', $html, 1);
if (!isset($urls[0]))
return NULL;
else
return $urls[0]; //return first IMDb result
}
function geturl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1");
$html = curl_exec($ch);
curl_close($ch);
return $html;
}
function match_all($regex, $str, $i = 0)
{
if(preg_match_all($regex, $str, $matches) === false)
return false;
else
return $matches[$i];
}
They will, in fact, throttle you if you make queries too fast, or make too many. For example, their SOAP API limits you to 1k queries a day. Either throw in a wait, or use something that invites this kind of use… such as Yahoo’s BOSS. http://developer.yahoo.com/search/boss/
ETA: I really, really, like BOSS, and I’m a Google fangirl. It gives you a lot of resources and clean data and flexibility… Google never gave us anything like this, which is too bad.