I’m attempting to retrieve images from a web page, and it has been working well so far, except one of the sites I am looking at is serving images as Content-Type: text/html, causing my script to reject it as not a real image.
This is the code snippet I am using to determine content-type:
$accepted_mime = array('image/gif', 'image/jpeg', 'image/jpg', 'image/png');
$headers = get_headers($image);
// Find the Content-Type header
$num_headers = sizeOf($headers);
for($x=0;$x<$num_headers;$x++) {
preg_match('/^Content-Type: (.+)$/', $headers[$x], $mime_type);
if (isset($mime_type[1]) && in_array($mime_type[1], $accepted_mime)) {
return true;
}
}
For sites I’ve tried, they return properly (results such as image/gif, image/png, etc), but mpaa.org seems to serve their images with type text/html. Is this normal?
I added a print_r to see the header array returned by get_headers`:
Array
(
[0] => http://www.mpaa.org/templates/images/header_mpaa_logo.gif
[1] => Array
(
[0] => HTTP/1.1 200 OK
[1] => Server: nginx/1.2.0
[2] => Date: Sat, 17 Nov 2012 17:19:06 GMT
[3] => Content-Type: text/html
[4] => Connection: close
[5] => P3P: CP="NON DSP COR ADMa OUR IND UNI COM NAV INT"
[6] => Cache-Control: no-cache, no-store, must-revalidate
[7] => Pragma: no-cache
)
)
I could easily add text/html to my list of accepted content-types, but that’s definitely not the ideal solution 😉 Does anyone know why mpaa.org serves their images with this Content-Type? Is it regular practice to do so (perhaps with legacy websites/servers)?
Thanks 🙂
The wonderful MPAA is using user-agent sniffing or checking cookies to determine if your browser supports JavaScript. Since you are not specifying a user-agent string or sending cookies, they assume you don’t have JavaScript and return a page saying that, instead of the original image.
If you load this with a browser, you’ll note that you do get
image/gif, and the image you are after: http://www.mpaa.org/templates/images/header_mpaa_logo.gifIf you make that same request with cURL or Fiddler, or some other oddball user-agent string: