I’m trying to create a program that grabs data from a website x amount of times and I’m looking for a way to go about doing so without huge delays in the process.
Currently I use the following code, and it’s rather slow (even though it is only grabbing 4 peoples’ names, I’m expecting to do about 100 at a time):
$skills = array(
"overall", "attack", "defense", "strength", "constitution", "ranged",
"prayer", "magic", "cooking", "woodcutting", "fletching", "fishing",
"firemaking", "crafting", "smithing", "mining", "herblore", "agility",
"thieving", "slayer", "farming", "runecrafting", "hunter", "construction",
"summoning", "dungeoneering"
);
$participants = array("Zezima", "Allar", "Foot", "Arma150", "Green098", "Skiller 703", "Quuxx");//explode("\r\n", $_POST['names']);
$skill = isset($_GET['skill']) ? array_search($skills, $_GET['skill']) : 0;
display($participants, $skills, array_search($_GET['skill'], $skills));
function getAllStats($participants) {
$stats = array();
for ($i = 0; $i < count($participants); $i++) {
$stats[] = getStats($participants[$i]);
}
return $stats;
}
function display($participants, $skills, $stat) {
$all = getAllStats($participants);
for ($i = 0; $i < count($participants); $i++) {
$rank = getSkillData($all[$i], 0, $stat);
$level = getSkillData($all[$i], 1, $stat);
$experience = getSkillData($all[$i], 3, $stat);
}
}
function getStats($username) {
$curl = curl_init("http://hiscore.runescape.com/index_lite.ws?player=" . $username);
curl_setopt ($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($curl, CURLOPT_USERAGENT, sprintf("Mozilla/%d.0", rand(4, 5)));
curl_setopt ($curl, CURLOPT_HEADER, (int) $header);
curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt ($curl, CURLOPT_VERBOSE, 1);
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
$output = curl_exec($curl);
curl_close ($curl);
if (strstr($output, "<html><head><title>")) {
return false;
}
return $output;
}
function getSkillData($stats, $row, $skill) {
$stats = explode("\n", $stats);
$levels = explode(",", $stats[$skill]);
return $levels[$row];
}
When I benchmarked this it took about 5 seconds, which isn’t too bad, but imagine if I was doing this 93 more times. I understand it won’t be instant, but I’d like to shoot for under 30 seconds. I know it’s possible because I’ve seen websites which do something similar and they act within a 30 second time period.
I’ve read about using caching the data but that won’t work because, simply, it will be old. I’m using a database (further on, I haven’t gotten to that part yet) to store old data and retrieve new data which will be real time (what you see below).
Is there a way to achieve doing something like this without massive delays (and possibly overloading the server I am reading from)?
P.S: The website I am reading from is just text, it doesn’t have any HTML to parse, which should reduce the loading time. Here’s an example of what a page looks like (they’re all the same, just different numbers):
69,2496,1285458634 10982,99,33055154 6608,99,30955066 6978,99,40342518 12092,99,36496288 13247,99,21606979 2812,99,13977759 926,99,36988378 415,99,153324269 329,99,59553081 472,99,40595060 2703,99,28297122 281,99,36937100 1017,99,19418910 276,99,27539259 792,99,34289312 3040,99,16675156 82,99,39712827 80,99,104504543 2386,99,21236188 655,99,28714439 852,99,30069730 29,99,200000000 3366,99,15332729 2216,99,15836767 154,120,200000000 -1,-1 -1,-1 -1,-1 -1,-1 -1,-1 30086,2183 54640,1225 89164,1028 123432,1455 -1,-1 -1,-1
My previous benchmark with this method vs. curl_multi_exec:
function getTime() {
$timer = explode(' ', microtime());
$timer = $timer[1] + $timer[0];
return $timer;
}
function benchmarkFunctions() {
$start = getTime();
old_f();
$end = getTime();
echo 'function old_f() took ' . round($end - $start, 4) . ' seconds to complete<br><br>';
$startt = getTime();
new_f();
$endd = getTime();
echo 'function new_f() took ' . round($endd - $startt, 4) . ' seconds to complete';
}
function old_f() {
$test = array("A E T", "Ts Danne", "Funkymunky11", "Fast993", "Fast99Three", "Jeba", "Quuxx");
getAllStats($test);
}
function new_f() {
$test = array("A E T", "Ts Danne", "Funkymunky11", "Fast993", "Fast99Three", "Jeba", "Quuxx");
$curl_arr = array();
$master = curl_multi_init();
$amt = count($test);
for ($i = 0; $i < $amt; $i++) {
$curl_arr[$i] = curl_init('http://hiscore.runescape.com/index_lite.ws?player=' . $test[$i]);
curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $curl_arr[$i]);
}
do {
curl_multi_exec($master, $running);
} while ($running > 0);
for ($i = 0; $i < $amt; $i++) {
$results = curl_exec($curl_arr[$i]);
}
}
You can reuse curl connections. Also, I changed your code to check the
httpCodeinstead of usingstrstr. Should be quicker.Also, you can setup curl to do it in parallel, which I’ve never tried. See http://www.php.net/manual/en/function.curl-multi-exec.php
An improved
getStats()with reused curl handle.Usage: