I am looking for an associative collection that supports both retrieval and insertion of values by key (deletion not important) in at least O(Log(N)) time, and that has a very low memory overhead both in terms of code size and run-time memory consumption.
I am doing this for a small embedded application written in C, so I am trying to minimize the amount of code required, and the amount of memory consumed.
The Google sparse hash data structure would be a possibility if it wasn’t written in C++, and was simpler.
Most hash table implementations that I am aware of use a fair amount of extra space, requiring at least twice as much space as the total number of key-values, or else requiring extra pointers per entry (e.g. bucket chaining hash algorithms). In my structure, key value pairs are just two pointers.
Currently I am using an array of key/value pairs which is sorted, but the insertion is O(N). I can’t help but think there must be a clever way to improve the amortized running time of insertion, for example by doing the insertions in groups, but I am not having any success.
I think that this must be a relatively well-known problem in certain circles, so to make this not too subjective, I’m wondering what the most common solution to the problem stated above is?
[EDIT:]
Some additional information that could be relevant:
- Keys are integers
- Number of values could be tiny anywhere from 1 to 2^32.
- Usage patterns are unpredicatable.
- I am hoping to keep memory consumption as low as possible (e.g. doubling the size of memory required, would not be ideal)
You could use a hash table that doesn’t use chaining, such as a linear probing or cuckoo hashing scheme. The backing implementation is just an array, and with a load factor of around 0.5, the overhead won’t be too bad, and the implementation complexity (at least for linear or quadratic probing) isn’t too much.
If you want a good implementation of a binary search tree that has excellent guarantees on performance and isn’t too hard to code up, consider looking into splay trees. They guarantee amortized O(lg n) lookups, and require just two pointers per node. The balance step is also substantially easier than most balanced BSTs.