Lately I had an issue with an array that contained some hundred thousands of values and the only thing I wanted to do was to check whether a value was already present.
In my case this were IPs from a webserver log.
So basically something like:
in_array(ip2long(ip),$myarray) did the job
However the lookup time increased dramatically and 10k of lookups took around 17 seconds or so.
So in this case I didn’t care whether I had duplicates or not, I just needed to check for existence. So I could store the IPs in the index like this:
isset($myarray[ip2long($ip)])
And boom, lookup times went down from 17 seconds (and more) to a static time of 0.8 seconds for 10k lookups. As a value for the array entry I just used int 1.
I think the array index is probably based on some b-tree which should have log(n) lookup time and the index on a hashmap.
In my case using the index worked fine, but are there any data structures where I can use hashmaps as a value index, where multiple values may also occour (i realize that this makes only sense if do not have too many duplicates and I cannot use range/search requests efficiently, which is the primary benefit of tree structures)?
There are a whole range of alternatives datastructures beyond simple arrays in the SPL library bundled with PHP, including linked lists, stacks, heaps, queues, etc.
However, I suspect you could make your logic a whole lot more efficient if you flipped your array, allowing you to do a lookup on the key (using the array_key_exists() function) rather than search for the value. The array index is a hash, rather than a btree, making for very fast direct access via the key.
However, if you’re working with 10k entries in an array, you’d probably be better taking advantage of a database, where you can define your own indexes.