I’ve been doing some profiling on different methods of accessing large(ish) arrays of data in PHP. The use case is pretty simple: some of our tools output data into PHP files as associative arrays and these files are considered static data by the application. We make games, so some examples of data files would include items in a catalog, tasks that a user must complete, or definitions for maps:
<?php
$some_data = array(
...lots and lots of stuff in here...
);
?>
Since these arrays are large-ish(400K), and much of our code is interested in this data, it becomes necessary to access this data as efficiently as possible. I settled on timing 3 different patterns for doing this. After presenting the methods I will share my results below.
What I’m looking for is some experience based validation on these methods and their timing as well as any other methods to try out.
Method #1: getter function
In the method, the exporter actually creates a file that looks like:
<?php
function getSomeData()
{
$some_data = array(
...lots and lots of stuff here...
);
return $some_data;
}
?>
Client code can then get the data by simply calling getSomeData() when they want it.
Method #2: global + include
In this method the data file looks identical to the original code block above, however the client code must jump through a few hoops to get the data into a local scope. This assumes the array is in a file called ‘some_data.php’;
global $some_data; //must be the same name as the variable in the data file...
include 'some_data.php';
This will bring the $some_data array into scope, though it is a bit cumbersome for client code (my opinion).
Method #3: getter by reference
This method is nearly identical to Method #1, however the getter function does not return a value but rather sets a reference to the data.
<?php
function getSomeDataByRef($some_data)
{
$some_data = array(
...lots and lots of stuff here...
);
return $some_data;
}
?>
Client code then retrieves the data by declaring a local variable (called anything) and passing it by reference to the getter:
$some_data_anyname = array();
getSomeDataByRef(&$some_data_anyname);
Results
So I ran a little script that runs each of these methods of retrieving data 1000 times on and averages the run time (computed by microtime(true) at the beginning and end). The following are my results (in ms, running on a MacBookPro 2GHz, 8GB RAM, PHP version 5.3.4):
METHOD #1:
AVG: 0.0031637034416199
MAX: 0.0043289661407471
MIN: 0.0025908946990967
METHOD #2:
AVG: 0.01434082698822
MAX: 0.018275022506714
MIN: 0.012722969055176
METHOD #3:
AVG: 0.00335768699646
MAX: 0.0043489933013916
MIN: 0.0029017925262451
It seems pretty clear, from this data anyway, that the global+include method is inferior to the other two, which are “negligible” difference.
Thoughts?
Am I completely missing anything? (probably…)
Thanks in advance!
Not sure if this is exactly what your looking for but it should help out with speed and memory issues. You can use the fixed spl array:
Read more on big php arrays here:
http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
Also have you thought about storing the data in a cache/memory? For example you could use mysqlite with the inmemory engine on the first execution then access data from there: