I have a series of data points (tuples) in a list with a format like:
points = [(1, 'a'), (2, 'b'), (2, 'a'), (3, 'd'), (4, 'c')]
The first item in each tuple is an integer and they are assured to be sorted. The second value in each tuple is an arbitrary string.
I need them grouped in lists by their first value in a series. So given an interval of 3, the above list would be broken into:
[['a', 'b', 'a', 'd'], ['c']]
I wrote the following function, which works fine on small data sets. However, it is inneficient for large inputs. Any tips on how to rewrite/optimize/mininize this so I can process large data sets?
def split_series(points, interval):
series = []
start = points[0][0]
finish = points[-1][0]
marker = start
next = start + interval
while marker <= finish:
series.append([point[1] for point in points if marker <= point[0] < next])
marker = next
next += interval
return series
For completeness, here’s a solution with
itertools.groupby, but the dictionary solution will probably be faster (not to mention a lot easier to read).Note that the above assumes you’ve got at least one item in each group, otherwise it’ll give different results from your script, i.e.:
instead of
Here’s a fixed-up dictionary solution. At some point the dictionary lookup time will begin to dominate, but maybe it’s fast enough for you like this.