I need to get the lesser n numbers of a list in Python. I need this to be really fast because it’s in a critical part for performance and it needs to be repeated a lot of times.
n is usually no greater than 10 and the list usually has around 20000 elements. The list is always different each time I call the function. Sorting can’t be made in place.
Initially, I have written this function:
def mins(items, n): mins = [float('inf')]*n for item in items: for i, min in enumerate(mins): if item < min: mins.insert(i, item) mins.pop() break return mins
But this function can’t beat a simple sorted(items)[:n] which sort the entire list. Here is my test:
from random import randint, random import time test_data = [randint(10, 50) + random() for i in range(20000)] init = time.time() mins = mins(test_data, 8) print 'mins(items, n):', time.time() - init init = time.time() mins = sorted(test_data)[:8] print 'sorted(items)[:n]:', time.time() - init
Results:
mins(items, n): 0.0632939338684 sorted(items)[:n]: 0.0231449604034
sorted()[:n] is three times faster. I believe this is because:
- insert() operation is costly because Python lists are not linked lists.
- sorted() is an optimized c function and mine is pure python.
Is there any way to beat sorted()[:n] ? Should I use a C extension, or Pyrex or Psyco or something like that?
Thanks in advance for your answers.
You actually want a sorted sequence of mins.
This runs much faster because you aren’t even looking at mins unless it’s provably got a value larger than the given item. About 1/10th the time of the original algorithm.
This ran in zero time on my Dell. I had to run it 10 times to get a measurable run time.
Using
bisect.insortinstead of append and sort may speed this up a hair further.