Based on that answer here are two versions of merge function used for mergesort.
Could you help me to understand why the second one is much faster.
I have tested it for list of 50000 and the second one is 8 times faster (Gist).
def merge1(left, right):
i = j = inv = 0
merged = []
while i < len(left) and j < len(right):
if left[i] <= right[j]:
merged.append(left[i])
i += 1
else:
merged.append(right[j])
j += 1
inv += len(left[i:])
merged += left[i:]
merged += right[j:]
return merged, inv
.
def merge2(array1, array2):
inv = 0
merged_array = []
while array1 or array2:
if not array1:
merged_array.append(array2.pop())
elif (not array2) or array1[-1] > array2[-1]:
merged_array.append(array1.pop())
inv += len(array2)
else:
merged_array.append(array2.pop())
merged_array.reverse()
return merged_array, inv
Here is the sort function:
def _merge_sort(list, merge):
len_list = len(list)
if len_list < 2:
return list, 0
middle = len_list / 2
left, left_inv = _merge_sort(list[:middle], merge)
right, right_inv = _merge_sort(list[middle:], merge)
l, merge_inv = merge(left, right)
inv = left_inv + right_inv + merge_inv
return l, inv
.
import numpy.random as nprnd
test_list = nprnd.randint(1000, size=50000).tolist()
test_list_tmp = list(test_list)
merge_sort(test_list_tmp, merge1)
test_list_tmp = list(test_list)
merge_sort(test_list_tmp, merge2)
Similar answer as kreativitea‘s above, but with more info (i think!)
So profiling the actual merge functions, for the merging of two 50K arrays,
merge 1
merge2
So for merge1, it’s 221309
len, 90436append, and takes 15.363 seconds.So for merge2, it’s 50000
len, 100000append, and 100000popand takes 0.104 seconds.lenandappendpopare all O(1) (more info here), so these profiles aren’t showing what’s actually taking the time, since going of just that, it should be faster, but only ~20% so.Okay the cause is actually fairly obvious if you just read the code:
In the first method, there is this line:
so every time that is called, it has to rebuild an array. If you comment out this line (or just replace it with
inv += 1or something) then it becomes faster than the other method. This is the single line responsible for the increased time.Having noticed this is the cause, the issue can be fixed by improving the code; change it to this for a speed up. After doing this, it will be faster than
merge2Update it to this: