I’m using fsockopen on a small cronjob to read and parse feeds on different servers. For the most past, this works very well. Yet on some servers, I get very weird lines in the response, like this:
<language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> 11 <item> <title> 1f July 8th, 2010</title> <link> 32 http://darkencomic.com/?p=2406</link> <comments> 3e
But when I open the feed in e.g. notepad++, it works just fine, showing:
<language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item> <title>July 8th, 2010</title> <link>http://darkencomic.com/?p=2406</link> <comments>
…just to show an excerpt. So, am I doing anything wrong here or is this beyond my control? I’m grateful for any idea to fix this.
Here’s part of the code I’m using to retrieve the feeds:
$fp = @fsockopen($url["host"], 80, $errno, $errstr, 5);
if (!$fp) {
throw new UrlException("($errno) $errstr ~~~ on opening ".$url["host"]."");
} else {
$out = "GET ".$path." HTTP/1.1\r\n"
."Host: ".$url["host"]."\r\n"
."Connection: Close\r\n\r\n";
fwrite($fp, $out);
$contents = '';
while (!feof($fp)) {
$contents .= stream_get_contents($fp,128);
}
fclose($fp);
This looks like HTTP Chunked transfer encoding — which is a way HTTP has of segmenting a response into several small parts ; quoting :
When working with `fsockopen` and the like, you have to deal with the HTTP Protocol yourself… Which is not always as easy as one might think 😉
A solution to avoid having to deal with such stuff would be to use something like curl : it already knows the HTTP Protocol — which means you won’t have to re-invent the whell 😉