I’m looking for a more efficient way to reprioritize items in a priority queue. I have a (quite naive) priority queue implementation based on heapq. The relevant parts are like:
from heapq import heapify, heappop
class pq(object):
def __init__(self, init= None):
self.inner, self.item_f= [], {}
if not None is init:
self.inner= [[priority, item] for item, priority in enumerate(init)]
heapify(self.inner)
self.item_f= {pi[1]: pi for pi in self.inner}
def top_one(self):
if not len(self.inner): return None
priority, item= heappop(self.inner)
del self.item_f[item]
return item, priority
def re_prioritize(self, items, prioritizer= lambda x: x+ 1):
for item in items:
if not item in self.item_f: continue
entry= self.item_f[item]
entry[0]= prioritizer(entry[0])
heapify(self.inner)
And here is a simple co-routine to just demonstrate the reprioritize characteristics in my real application.
def fecther(priorities, prioritizer= lambda x: x+ 1):
q= pq(priorities)
for k in xrange(len(priorities)+ 1):
items= (yield k, q.top_one())
if not None is items:
q.re_prioritize(items, prioritizer)
With testing
if __name__ == '__main__':
def gen_tst(n= 3):
priorities= range(n)
priorities.reverse()
priorities= priorities+ range(n)
def tst():
result, f= range(2* n), fecther(priorities)
k, item_t= f.next()
while not None is item_t:
result[k]= item_t[0]
k, item_t= f.send(range(item_t[0]))
return result
return tst
producing:
In []: gen_tst()()
Out[]: [2, 3, 4, 5, 1, 0]
In []: t= gen_tst(123)
In []: %timeit t()
10 loops, best of 3: 26 ms per loop
Now, my question is, does there exist any data-structure which would avoid calls to heapify(.), when repriorizating the priority queue? I’m here willing to trade memory for speed, but it should be possible to implement it in pure Python (obviously with much more better timings than my naive implementation).
Update:
In order to let you to understand more on the specific case, lets assume that no items are added to the queue after initial (batch) pushes and then every fetch (pop) from the queue will generate number of repriorizations roughly like this scheme:
- 0*
n, very seldom - 0.05*
n, typically n, very seldom
where n is the current number of itemsin queue. Thus, in any round, there are more or less only relative few items to repriorizate. So I’m hoping that there could exist a data-structure that would be able to exploit this pattern and therefore outperforming the cost of doing mandatory heapify(.) in every round (in order to satisfy the heap invariant).
Update 2:
So far it seems that the heapify(.) approach is quite efficient (relatively speaking) indeed. All the alternatives I have been able to figure out, needs to utilize heappush(.) and it seems to be more expensive what I originally anticipated. (Anyway, if the state of issue remains like this, I’m forced to find a better solution out of the python realm).
Since the new prioritization function may have no relationship to the previous one, you have to pay the cost to get the new ordering (and it’s at minimum O(n) just to find the minimum element in the new ordering). If you have a small, fixed number of prioritization functions and switch frequently between them, then you could benefit from keeping a separate heap going for each function (although not with heapq, because it doesn’t support cheaply locating and removing and object from the middle of a heap).