The situation:
- I have multiple arrays of containing multiple complex objects, each storing different data, but in same format.
- Now, these arrays (containing objects) are too complex to be stored in a sql table, so i serialize them, and store each array in separate files.
- I use PHP function
file_get_contents()read the data, and then i useunserialize()on the data. - I have to load one file (max 100mb) per client request and ‘unserialize()’ it and process it.
- This data is not the same for every client
- All data in total is around 3GB.
- This data is updated every 24 hours, and the size of data is increased per update.
- Maximum data per file is 100mb.
The problem:
- The method i currently am using works fine for small file sizes (upto 5mb).
- But when it comes to larger files of sizes, its taking too much time.
- The function
unserialize()is taking about 33 seconds to execute if i try to load a file with size around 40mb. - So the main problem with my current method is with
unserialize().
The main question:
- How can i store my very complex objects without serializing them, or how can i make my unserialization faster?
If you need PHP objects that are not
stdClass(you have class definitions next to data-members) you need to use any kind of PHP compatible serialization.Independent to the PHP language, serialization comes with a price because it is data transformation and mapping. If you have a large amount of data that needs to be transposed from and into string (binary) information, it takes its processing and memory.
By default is PHP’s built-in serialization that you make use with
serializeandunserialize. PHP offers two default serialization types. Other extensions offer something similar. Related question:php_binaryserialization handler?As you’ve said you need some kind of serialization and unserializing is the bottleneck, you could consider to choose another serializer like igbinary.
However, storing PHP in flat files works, too. See
var_export:This example stores data in a format that PHP can read the file back in. Useful for structured data in forms of stdClass objects and arrays. Reading this back in is pretty straight forward:
If you put the PHP code into the database, you don’t need the
<?phpprefix:The benefit of using
var_exportis that you make use of PHP itself to parse the data. It’s normally faster thanserialize/unserializebut for your case you need to metric that anyway.I suggest you try with
var_exporthow it behaves in terms of file-size and speed. And also with igbinary. Compare then. Leave the updated information with your question when you gather it, so additional suggestion can be given in case this does not solve your issue.Another thing that comes to mind is using the Json format. Some data-stores are optimized for it, so you can query the store directly. Also the map-reduce methodology can be used with many of these data-stores so you can spread processing of the data. That’s something you won’t get straight with
serialize/unserializeas it’s always processing one big chunk of data at a time, you can not differ.