I am trying to find top 4 maximum value from integer array input. For example for given input array {1232, -1221, 0, 345, 78, 99} will return {1232, 345, 99, 78} as a top 4 maximum value. I have solved the requirement with following method below. But I am still not satisfy with its time efficiency. Is there any chance to optimize the method more as the input become larger? Any clues are really appreciated. Thank you.
public int[] findTopFourMax(int[] input) {
int[] topFourList = { Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE };
for (int current : input) {
if (current > topFourList[0]) {
topFourList[3] = topFourList[2];
topFourList[2] = topFourList[1];
topFourList[1] = topFourList[0];
topFourList[0] = current;
} else if (current > topFourList[1]) {
topFourList[3] = topFourList[2];
topFourList[2] = topFourList[1];
topFourList[1] = current;
} else if (current > topFourList[2]) {
topFourList[3] = topFourList[2];
topFourList[2] = current;
} else if (current > topFourList[3]) {
topFourList[3] = current;
}
}
return topFourList;
}
Simplest (though not most efficient) way will be to sort the array at take the subarray containing the last 4 elements.
You can use
Arrays.sort()to sort andArrays.copyOfRange()to take the subarray.For more efficient solution, one can maintain a min-heap of top K elements or use selection algorithm to find the top 4th element. The two approaches are described in this thread.
Though the selection algorithm offers
O(n)solution, the min-heap solution (which isO(nlogK)) should have better constants, and especially for smallkis likely to be faster.P.S. (EDIT):
For 4 elements, you might find that invoking a loop 4 times, and finding a max in each of them (and changing the old max to -infinity in each iteration) will be more efficient then the more “complex” approaches, since it requires sequential reads and have fairly small constants. This is of course not true for larger
k, and decays intoO(n^2)fork->nEDIT2: benchmarking:
for the fun of it, I ran a benchmark on the attached code. The results are:
We can see that the naive and heap are much better then the sort based approach, and the naive is slightly slower then the heap based. I did a wilcoxon test to check if the difference between naive and heap is statistically significant, and I got a P_Value of
3.4573e-17. This means that the probability of the two approaches are “identical” is 3.4573e-17 (extremely small). From this we can conclude – heap based solution gives better performance then naive and sorting solution (and we empirically proved it!).Attachment: The code I used: