For lack of a better name, I’d like to do an “izip_sorted” in Python. The input to the function is a number of iterables, each sorted. The ouput is a single iterable with sorted output.
print([x for x in izip_sorted([0,4,8], [1,3,5], [12,12,42],[])])
Edit: This is a simple example. The real usage will be on about 40 input iterables, each with about 100000 elements. Each element is a class stores a dict and implements __cmp__ so that the elements can be sorted. The data is too large to read in all at once.
should print
[0, 1, 3, 4, 5, 8, 12, 12, 42]
I have a solution but I’m new to python and I don’t know that it’s very Pythonic. Can this be improved upon? The sort where only 1 element has changed seems wasteful…
def izip_sorted(*iterables):
"""
Return an iterator that outputs the values from the iterables, in sort order
izip_sort('ABF', 'D', 'CE') --> A B C D E F
"""
iterators = [iter(it) for it in iterables]
current_iterators = []
for it in iterators:
try:
current_iterators.append((next(it), it))
except StopIteration:
pass
current_iterators.sort(key=lambda x: x[0])
while(current_iterators):
yield current_iterators[0][0]
try:
current_iterators[0] = (next(current_iterators[0][1]), current_iterators[0][1])
current_iterators.sort(key=lambda x: x[0])
except StopIteration:
current_iterators = current_iterators[1:]
If the inputs are not sorted, then they must all be realized (essentially, turned from an iterable into a list). You can’t sort without looking at the data. LattyWare’s solution is the most pythonic.
If on the other hand, the input iterables are known to be sorted, you can use heapq.merge: