Let’s say I have a list of objects. (All together now: “I have a list of objects.”) In the web application I’m writing, each time a request comes in, I pick out up to one of these objects according to unspecified criteria and use it to handle the request. Basically like this:
def handle_request(req):
for h in handlers:
if h.handles(req):
return h
return None
Assuming the order of the objects in the list is unimportant, I can cut down on unnecessary iterations by keeping the list sorted such that the most frequently used (or perhaps most recently used) objects are at the front. I know this isn’t something to be concerned about – it’ll make only a miniscule, undetectable difference in the app’s execution time – but debugging the rest of the code is driving me crazy and I need a distraction 🙂 so I’m asking out of curiosity: what is the most efficient way to maintain the list in sorted order, descending, by the number of times each handler is chosen?
The obvious solution is to make handlers a list of (count, handler) pairs, and each time a handler is chosen, increment the count and resort the list.
def handle_request(req):
for h in handlers[:]:
if h[1].handles(req):
h[0] += 1
handlers.sort(reverse=True)
return h[1]
return None
But since there’s only ever going to be at most one element out of order, and I know which one it is, it seems like some sort of optimization should be possible. Is there something in the standard library, perhaps, that is especially well-suited to this task? Or some other data structure? (Even if it’s not implemented in Python) Or should/could I be doing something completely different?
Python’s sort algorithm,
timsort, is pretty magical: if your listed is sorted except for one element, it will intrinsically (discover and) use that fact, sorting inO(N)time. (Josh Bloch, the Java guru, was so impressed by a presentation about timsort’s performance characteristics that he started coding it for Java on his laptop — it’s supposed to become Java’s standard sort pretty soon). I’d just do a sort after each locate-and-increment-count, and very much doubt that other approaches can beat timsort.Edit: the first alternative that comes to mind, of course, is to possibly “shift up” just the item whose count you’ve just incremented. But first, a little optimization to avoid copying
handlers…):now, the “shift up” variant
I can imagine patterns of access where this approach might save a little time — for example, if the distribution was so skewed that most hits were in handlers[0], this would do little work beyond one comparison (while
sortneeds about N of them even in the best case). Without representative samples of your access patterns, I can’t confirm or disprove this!-)