I’m writing a command line application in PHP that accepts a path to a local input file as an argument. The input file will contain one of the following things:
- JSON encoded associative array
- A
serialized()version of the associative array - A base 64 encoded version of the
serialized()associative array - Base 64 encoded JSON encoded associative array
- A plain old PHP associative array
- Rubbish
In short, there are several dissimilar programs that I have no control over that will be writing to this file, in a uniform way that I can understand, once I actually figure out the format. Once I figure out how to ingest the data, I can just run with it.
What I’m considering is:
- If the first byte of the file is
{, tryjson_decode(), see if it fails. - If the first byte of the file is
<or$, tryinclude(), see if it fails. - if the first three bytes of the file match a:[0-9], try
unserialize(). - If not the first three, try
base64_decode(), see if it fails. If not:- Check the first bytes of the decoded data, again.
- If all of that fails, it’s rubbish.
That just seems quite expensive for quite a simple task. Could I be doing it in a better way? If so, how?
There isn’t much to optimize here. The magic bytes approach is already the way to go. But of course the actual deserialization functions can be avoided. It’s feasible to use a verification regex for each instead (which despite the meme are often faster than having PHP actually unpack a nested array).
base64is easy enough to probe for.jsoncan be checked with a regex. Fastest way to check if a string is JSON in PHP? is the RFC version for securing it in JS. But it would be feasible to write a complete json(?R)match rule.serializeis a bit more difficult without a proper unpack function. But with some heuristics you can already assert that it’s a serialize blob.phparray scripts can be probed a bit faster withtoken_get_all. Or if the format and data is constrained enough, again with a regex.The more important question here is, do you need reliability – or simplicity and speed?