I have written a PHP script based on a piece of code I’ve found using Google. It’s purpose is to check particular site’s position in Google, given a particular keyword. Firstly, it prepares an appropriate URL to query Google (something like: “http://www.google.com/search?q=the+keyword&ie=utf-8&oe=utf-8&num=50“), then it downloads the source of a site located at the URL prepared before. After that, it counts the position using regular expressions and the knowledge about what div’s classes does Google use for results.
The script works fine when the URL I want to download from is in the domain “google.com”. But since I it’s intended to check position for polish people, I would like it to use “google.pl”. I wouldn’t care, but the search results can really vary between the two (even more than 100 positions of difference). Unfortunately, when I try to use the “pl” domain, the cURL just doesnt’t return anything (it waits for the timeout first). However, when I ran my script on another server, it worked perfectly on both of “google.com” and “google.pl” domains. Do you have an idea why can something like this happen? Is there a possibility that my server was banned from querying the “google.pl” domain?
Here, my cURL code:
private function cURL($url)
{
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,5);
return curl_exec($ch);
curl_close($ch);
}
First of all, I cannot reproduce your problem. I used the following 3 cURL commands to simulate your situation:
The first one is
.com, because this should work as your reference point. Positive.The second one is
.pl, because this is where you are encountering problems with. This also just works for me.The third one is
.nl, because this is where I live (so basically what’s.plfor you). This too just works for me.I’m not sure, but this could be one possible explanation:
google.nlfor example, I still go togoogle.com/search?q=...(the only difference is the additionallang-param).google.nl/search?q=...redirects togoogle.com(302). Its actual body is empty.If this is true (which I’ll check now), you need to use
google.comas domain and add an additionallang-param, instead of usinggoogle.pl.The reason your other server does the trick, can be because cURL’s configuration varies, or the cURL version isn’t the same.
Also, it’s blocking cURL’s default user-agent string, so I’ld also suggest you to change it into something like:
This has nothing to do with the problems you’re encountering, but you don’t actually close your cURL socket, since you return before you close it (everything after
return ...will be ‘skipped’).