I’m currently working on an embedded device project where I’m running into performance problems. Profiling has located an O(N) operation that I’d like to eliminate.
I basically have two arrays int A[N] and short B[N]. Entries in A are unique and ordered by external constraints. The most common operation is to check if a particular value a appears in A[]. Less frequently, but still common is a change to an element of A[]. The new value is unrelated to the previous value.
Since the most common operation is the find, that’s where B[] comes in. It’s a sorted array of indices in A[], such that A[B[i]] < A[B[j]] if and only if i<j. That means that I can find values in A using a binary search.
Of course, when I update A[k], I have to find k in B and move it to a new position, to maintain the search order. Since I know the old and new values of A[k], that’s just a memmove() of a subset of B[] between the old and new position of k. This is the O(N) operation that I need to fix; since the old and new values of A[k] are essentially random I’m moving on average about N/2 N/3 elements.
I looked into std::make_heap using [](int i, int j) { return A[i] < A[j]; } as the predicate. In that case I can easily make B[0] point to the smallest element of A, and updating B is now a cheap O(log N) rebalancing operation. However, I generally don’t need the smallest value of A, I need to find if any given value is present. And that’s now a O(N log N) search in B. (Half of my N elements are at heap depth log N, a quarter at (log N)-1, etc), which is no improvement over a dumb O(N) search directly in A.
Considering that std::set has O(log N) insert and find, I’d say that it should be possible to get the same performance here for update and find. But how do I do that? Do I need another order for B? A different type?
B is currently a short [N] because A and B together are about the size of my CPU cache, and my main memory is a lot slower. Going from 6*N to 8*N bytes would not be nice, but still acceptable if my find and update go to O(log N) both.
If the only operations are (1) check if value ‘a’ belongs to A and (2) update values in A, why don’t you use a hash table in place of the sorted array B? Especially if A does not grow or shrink in size and the values only change this would be a much better solution. A hash table does not require significantly more memory than an array. (Alternatively, B should be changed not to a heap but to a binary search tree, that could be self-balancing, e.g. a splay tree or a red-black tree. However, trees require extra memory because of the left- and right-pointers.)
A practical solution that grows memory use from 6N to 8N bytes is to aim for exactly 50% filled hash table, i.e. use a hash table that consists of an array of 2N shorts. I would recommend implementing the Cuckoo Hashing mechanism (see http://en.wikipedia.org/wiki/Cuckoo_hashing). Read the article further and you find that you can get load factors above 50% (i.e. push memory consumption down from 8N towards, say, 7N) by using more hash functions. “Using just three hash functions increases the load to 91%.“
From Wikipedia: