Reading Guido’s infamous answer to the question Sorting a million 32-bit integers in 2MB of RAM using Python, I discovered the module heapq.
I also discover I didn’t understand jack about it, nor did I know what I could do with it.
Can you explain to me (with the proverbial 6 years old target) what is the heap queue algorithm for and what you can do with it ?
Can you provide a simple Python snippet where using it (with the heapq module) solves a problem that will be better solved with it and not with something else ?
heapqimplements binary heaps, which are a partially sorted data structure. In particular, they have three interesting operations:heapifyturns a list into a heap, in-place, in O(n) time;heappushadds an element to the heap in O(lg n) time;heappopretrieves the smallest element off the heap in O(lg n) time.Many interesting algorithms rely on heaps for performance. The simplest one is probably partial sorting: getting the k smallest (or largest) elements of a list without sorting the entire list.
heapq.nsmallest(nlargest) does that. The implementation ofnlargestcan be paraphrased as:Analysis: let N be the number of elements in
l.heapifyis run once, for a cost of O(n); that’s negligible. Then, in a loop running N-n = O(N) times, we perform aheappopand aheappushat O(lg n) cost each, giving a total running time of O(N lg n). When N >> n, this is a big win compared to the other obvious algorithm,sorted(l)[:n], which takes O(N lg N) time.