I’ve been looking around the internet hoping that this is possible, I basically need

Question

0

Asked: June 3, 20262026-06-03T07:16:39+00:00 2026-06-03T07:16:39+00:00

I’ve been looking around the internet hoping that this is possible, I basically need

0

I’ve been looking around the internet hoping that this is possible, I basically need to get just the title of a webpage and nothing else.

web crawlers can take a long time performing tasks because they have to load pages before examinining them, this is inefficient for what I am trying to achieve… here’s what I have so far

php code

$url = 'http://www.ebay.com/itm/300702997750#ht_500wt_1156';
$str = file_get_contents($url);
$title = ''; 

if(strlen($str)>0){
   preg_match("/\<title\>(.*)\<\/title\>/",$str,$titleArr);
   $title = $titleArr[1];
}

I want to know whether it would be possible to crawl only part of a page (for example the first 2000 characters of page).

Any help would be appreciated, Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T07:16:41+00:00

You could use substr to just grab the first 1000 chars, alternatively, you could use

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
echo $result;

that will only download the first 500 bytes. You can bench that by running something like this extremely ugly rubbish code:

$url = 'http://www.example.com/';
$range = array();
$repeats = 10;

function average($a){
  return array_sum($a)/count($a) ;
}

for ($i=0;$i<$repeats;$i++) {
    $time_start = microtime(true);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RANGE, '0-500');
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $result = curl_exec($ch);

    $time_end = microtime(true);
    $time = $time_end - $time_start;
    curl_close($ch);
    $range[] = $time;
}
echo "With range: average = ".round(average($range),2)." seconds (Min: ".round(min($range),2).", Max: ".round(max($range),2).")\n";

$range = array();

for ($i=0;$i<$repeats;$i++) {
    $time_start = microtime(true);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $result = curl_exec($ch);

    $time_end = microtime(true);
    $time = $time_end - $time_start;
    curl_close($ch);
    $range[] = $time;
}
echo "Without range: average = ".round(average($range),2)." seconds (Min: ".round(min($range),2).", Max: ".round(max($range),2).")\n";

If I run that on my site (http://www.focalstrategy.com/), I get:

With range: average = 0.38 seconds (Min: 0.35, Max: 0.41)
Without range: average = 0.56 seconds (Min: 0.53, Max: 0.7)

Against http://en.wikipedia.org/wiki/PHP, I get:

With range: average = 0.11 seconds (Min: 0.05, Max: 0.5)
Without range: average = 0.48 seconds (Min: 0.34, Max: 0.78)

Against Stack Overflow I get:

With range: average = 1.31 seconds (Min: 1.1, Max: 1.46)
Without range: average = 1.37 seconds (Min: 1.18, Max: 1.7)

and against eBay I get:

With range: average = 1.75 seconds (Min: 1.56, Max: 1.99)
Without range: average = 1.74 seconds (Min: 1.51, Max: 2.14)

You can see by testing that SO and eBay don’t support range requests.

In summary, sites that support this will get a speed up, those that don’t, won’t, you’ll just get the whole code instead.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve been looking around the internet hoping that this is possible, I basically need

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply