I have a problem and I want to make sure if I am doing it most efficiently. I have an array A of float values of size N. The values are all between 0 and 1.
I have to find top k values which can be a product of a maximum of three numbers from A. So, the top-k list can
have individual numbers from A, product of two numbers or product of three numbers from A.
So, this is how I am doing it now. I can get top-k numbers in desecding order in O(Nlogk) time. I then create a
max-heap and initialize it with best values of maximum size 3 i.e. if I represent the sorted array(descending) of k values as B
and the numbers by its index in that array, I insert numbers which are at index (0), (0,1) and (0,1,2). Next, I perform extract on heap and
whenever I extract a size z (product of z numbers) value, I replace it with the set of next possible size z numbers i.e.
if suppose (2,4) is extracted, I can replace it with (3,4) and (2,5). And do extract k times to get results.
Need better ideas if you have.
Thanks all.
if I understand you correctly you need to find k highest numbers that can be produced by multiplying together 1, 2 or 3 elements from your list, and all the values are floating point numbers between 0 and 1.
It is clear that you only need to consider the k highest numbers from the list. The rest can be discarded straight away. You can use your O(n log k) algorithm to get them, again in sorted order (I assume your list isn’t preordered). To simplify the problem, you can now take their logarithms and try to maximize the sums of the numbers instead of the original problem of maximizing the products. This might speed up little.
Now (considering the logarithmic presentation), all your numbers are negative, so adding more of them together will just create more and more negative numbers.
Let’s call the k highest numbers A1…Ak. We can reduce the problem further now assuming that there exists also number A0, that has the value 0 in the log representation and 1 in the original representation; then the problem is to enumerate the first k 3-tuples (x,y,z in {A0,…,Ak}) with the constraint that x ≥ y ≥ z and that z < A0. Let’s denote 3-tuple by [i,j,n] and the sum of the elements in this tuple by S[i,j,n]. The first element to be reported is obviously [0,0,1], i.e. , which corresponds in the original problem formulation to the singleton #1 value on the list.
We use a max-heap as in the original formulation; we push the triples to the heap, using their sums (S[…]) as the ordering key. The algorithm starts by pushing [0,0,0] to the heap. Then:
At the end, answer contains k + 1 elements, the first one of them is [0,0,0] which must be discarded.
Let be given as -1, -3, -8, -9. Then the algorithm proceeds like this:
The nice thing about this algorithm is that it doesn’t enumerate duplicates and the heap size is O(k); to see why, observe that the algorithm adds on every iteration the maximum of elements on the heap (often less), so after k iterations there cannot be more than 2k elements in the heap.
This gives then running time O(n log k + k log k) = O((n + k) log k).