When scrapping i. e. http://baidu.com , script doesn’t follow <meta.. refresh..> redirect. The code

Question

0

Asked: May 28, 20262026-05-28T01:25:55+00:00 2026-05-28T01:25:55+00:00

When scrapping i. e. http://baidu.com , script doesn’t follow <meta.. refresh..> redirect. The code

0

When scrapping i. e. http://baidu.com, script doesn’t follow <meta.. refresh..> redirect. The code I’m running:

require_once 'HTTP/Request2.php';

$request = new HTTP_Request2("http://baidu.com", HTTP_Request2::METHOD_GET);
$request->setConfig(array(
    'adapter' => 'HTTP_Request2_Adapter_Curl',
    'connect_timeout' => 15,
    'timeout' => 30,
    'follow_redirects' => TRUE,
    'max_redirects' => 10,
));

try {
    $response = $request->send();
    if (200 == $response->getStatus()) {

        $html = $response->getBody();
    } else {
        echo 'Unexpected HTTP status: ' . $response->getStatus() . ' ' .
        $response->getReasonPhrase();
    }
} catch (HTTP_Request2_Exception $e) {
    echo 'Error: ' . $e->getMessage();
}

print $html;

outputs:

<html>
<meta http-equiv="refresh" content="0;url=http://www.baidu.com/">
</html>

Is there a way to make it follow this redirect, to get proper html in $response->getBody()?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T01:25:56+00:00

The PEAR library does follow HTTP redirects since these are declared in the request header. The example you show in your question is an HTML meta refresh – a different mechanism.

What you’ll want to do is read the response to the HTTP request made via PEAR and parse the “meta refresh” tag, then make a second request to the URI that you managed to scrape out of the first request.

Below is an example of a function that will do this taken from a comment left on the PHP manual.

function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
 $result = false;

$contents = @file_get_contents($url);

// Check if we need to go somewhere else

if (isset($contents) && is_string($contents))
{
    preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);

    if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
    {
        if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
        {
            return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
        }

        $result = false;
    }
    else
    {
        $result = $contents;
    }
}

return $contents;
}

This snippet was found here: http://php.net/manual/en/function.get-meta-tags.php

As I explained, you can do something like the following:

//get the url from the meta redirect tag
$url = getUrlContents($site1);
//set up the new request in PEAR
$request = new HTTP_Request2($url, HTTP_Request2::METHOD_GET);

You may want to re-implement the getURLContents function so that it uses PEAR to get the first URL if this is your preferred method for making HTTP calls.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When scrapping i. e. http://baidu.com , script doesn’t follow <meta.. refresh..> redirect. The code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply