I’m attempting to retrieve images from a web page, and it has been working

Question

0

Asked: June 14, 20262026-06-14T12:07:42+00:00 2026-06-14T12:07:42+00:00

I’m attempting to retrieve images from a web page, and it has been working

0

I’m attempting to retrieve images from a web page, and it has been working well so far, except one of the sites I am looking at is serving images as Content-Type: text/html, causing my script to reject it as not a real image.

This is the code snippet I am using to determine content-type:

$accepted_mime = array('image/gif', 'image/jpeg', 'image/jpg', 'image/png');    
$headers = get_headers($image);

// Find the Content-Type header
$num_headers = sizeOf($headers);
for($x=0;$x<$num_headers;$x++) {
    preg_match('/^Content-Type: (.+)$/', $headers[$x], $mime_type);
    if (isset($mime_type[1]) && in_array($mime_type[1], $accepted_mime)) {
        return true;
    }
}

For sites I’ve tried, they return properly (results such as image/gif, image/png, etc), but mpaa.org seems to serve their images with type text/html. Is this normal?

I added a print_r to see the header array returned by get_headers`:

Array
(
    [0] => http://www.mpaa.org/templates/images/header_mpaa_logo.gif
    [1] => Array
        (
            [0] => HTTP/1.1 200 OK
            [1] => Server: nginx/1.2.0
            [2] => Date: Sat, 17 Nov 2012 17:19:06 GMT
            [3] => Content-Type: text/html
            [4] => Connection: close
            [5] => P3P: CP="NON DSP COR ADMa OUR IND UNI COM NAV INT"
            [6] => Cache-Control: no-cache, no-store, must-revalidate
            [7] => Pragma: no-cache
        )

)

I could easily add text/html to my list of accepted content-types, but that’s definitely not the ideal solution 😉 Does anyone know why mpaa.org serves their images with this Content-Type? Is it regular practice to do so (perhaps with legacy websites/servers)?

Thanks 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T12:07:43+00:00

The wonderful MPAA is using user-agent sniffing or checking cookies to determine if your browser supports JavaScript. Since you are not specifying a user-agent string or sending cookies, they assume you don’t have JavaScript and return a page saying that, instead of the original image.

If you load this with a browser, you’ll note that you do get image/gif, and the image you are after: http://www.mpaa.org/templates/images/header_mpaa_logo.gif

If you make that same request with cURL or Fiddler, or some other oddball user-agent string:

This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m attempting to retrieve images from a web page, and it has been working

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply