I have a file that has 1,000,000 float values in it. I need to find the 10,000 largest values.
I was thinking of:
- Reading the file
- Converting the strings to floats
- Placing the floats into a max-heap (a heap where the largest value is the root)
- After all values are in the heap, removing the root 10,000 times and adding those values to a list/arraylist.
I know I will have
- 1,000,000 inserts into the heap
- 10,000 removals from the heap
- 10,000 inserts into the return list
Would this be a good solution? This is for a homework assignment.
Your solution is mostly good. It’s basically a heapsort that stops after getting K elements, which improves the running time from
O(NlogN)(for a full sort) toO(N + KlogN). Here N = 1000000 and K = 10000.However, you should not do N inserts to the heap initially, as this would take
O(NlogN)– instead, use a heapify operation which turns an array to a heap in linear time.If the K numbers don’t need to be sorted, you can find the Kth largest number in linear time using a selection algorithm, and then output all numbers larger than it. This gives an
O(n)solution.