While iterating through a dataset, what’s the best way to keep track of only the top 10 numbers so far, in sorted order?
Solution…Ended up implementing Generic Min and Max Heaps…as sadly they are not available in Java libraries or readily on the internet….No gauruntees on the code…
import java.util.ArrayList;
public class MaxHeapGeneric <K extends Comparable> {
//ArrayList to hold the heap
ArrayList<K> h = new ArrayList<K>();
public MaxHeapGeneric()
{
}
public int getSize()
{
return h.size();
}
private K get(int key){
return h.get(key);
}
public void add(K key){
h.add(null);
int k = h.size() - 1;
while (k > 0){
int parent = (k-1)/2;
K parentValue = h.get(parent);
//MaxHeap -
//for minheap - if(key > parentValue)
if(key.compareTo(parentValue) <= 0) break;
h.set(k, parentValue);
k = parent;
}
h.set(k, key);
}
public K getMax()
{
return h.get(0);
}
public void percolateUp(int k, K key){
if(h.isEmpty())
return ;
while(k < h.size() /2){
int child = 2*k + 1; //left child
if( child < h.size() -1 && (h.get(child).compareTo(h.get(child+1)) < 0) )
{
child++;
}
if(key.compareTo(h.get(child)) >=0) break;
h.set(k, h.get(child));
k = child;
}
h.set(k, key);
}
public K remove()
{
K removeNode = h.get(0);
K lastNode = h.remove(h.size() - 1);
percolateUp(0, lastNode);
return removeNode;
}
public boolean isEmpty()
{
return h.isEmpty();
}
public static void main(String[] args)
{
MaxHeapGeneric<Integer> test = new MaxHeapGeneric<Integer>();
test.add(5);
test.add(9);
test.add(445);
test.add(1);
test.add(534);
test.add(23);
while(!test.isEmpty())
{
System.out.println(test.remove());
}
}
}
And a min heap
import java.util.ArrayList;
public class MinHeapGeneric <K extends Comparable> {
//ArrayList to hold the heap
ArrayList<K> h = new ArrayList<K>();
public MinHeapGeneric()
{
}
public int getSize()
{
return h.size();
}
private K get(int key){
return h.get(key);
}
public void add(K key){
h.add(null);
int k = h.size() - 1;
while (k > 0){
int parent = (k-1)/2;
K parentValue = h.get(parent);
//for minheap - if(key > parentValue)
if(key.compareTo(parentValue) > 0) break;
h.set(k, parentValue);
k = parent;
}
h.set(k, key);
}
public K getMax()
{
return h.get(0);
}
public void percolateUp(int k, K key){
if(h.isEmpty())
return ;
while(k < h.size() /2){
int child = 2*k + 1; //left child
if( child < h.size() -1 && (h.get(child).compareTo(h.get(child+1)) >= 0) )
{
child++;
}
if(key.compareTo(h.get(child)) < 0) break;
h.set(k, h.get(child));
k = child;
}
h.set(k, key);
}
public K remove()
{
K removeNode = h.get(0);
K lastNode = h.remove(h.size() - 1);
percolateUp(0, lastNode);
return removeNode;
}
public boolean isEmpty()
{
return h.isEmpty();
}
public static void main(String[] args)
{
MinHeapGeneric<Integer> test = new MinHeapGeneric<Integer>();
test.add(5);
test.add(9);
test.add(445);
test.add(1);
test.add(534);
test.add(23);
while(!test.isEmpty())
{
System.out.println(test.remove());
}
}
}
Use a min-heap (priority queue) to keep track of the top 10 items. With a binary heap, the time complexity is O(N log M), where N is the number of items and M is 10.
Compared to storing the top items in an array, this is faster for large M: array-based approach is O(NM). Ditto for linked lists.
In pseudocode:
Now heap contains 10 largest items. To pop: