Gperf consistently underperforms a Judy array in my environment, and I’m wondering if there is another perfect hash library out there built specifically for integer keys. I know the set of keys beforehand, and I would like to leverage that into a performance/size advantage.
There are ~1000 keys, and retrievals are not in sequential order. Key pairs are both integers. Keys are 32-bit, and retrieved values are 8-bit. Size is the most important factor.
If there is a way to tweak Gperf for integer keys, or just another approach in general, I’m all ears, too. 🙂
(Sidenote: …While typing out this question, I realized a binary search probably takes the cake and I’ve just over-thought the problem. I’d still like to hear any thoughts you may have for the sake of learning, though!)
Edit: Keys are not evenly distributed. Most are randomly clustered throughout the entire possible range.
Edit 2: Worst-case binary searches were too slow for my taste, so I ended up playing with the keys until I found 8 bits to use from each to make 256 well-distributed buckets. I held the min & max of each bucket (24 bits for each bucket entry), and made one big struct array for the key-pairs. On-par/faster and smaller than everything else I tested with for my particular case, so I guess I’m going with that for now. 🙂
Keep your keys sorted, and use an M-tree to retrieve any key.
An M-tree has M entry per node, instead of 2 for binaries.
It will help tremendously for performance.
Use a cache line size as the basis of your node size, hence 64 Bytes.
You can store 16 32bits values in this size.
Since you have 1000 values, 3 levels will be enough to retrieve the right key (as opposed to 10 levels for a binary tree).
Another idea would be to hash your keys into a small hash-table, such as 12-bits one (4K entries), and solve the potential collisions with a simple chain. You are very likely to get most of your keys in a single search.