I have a series of data points (tuples) in a list with a format

Question

0

Asked: May 12, 20262026-05-12T17:51:22+00:00 2026-05-12T17:51:22+00:00

I have a series of data points (tuples) in a list with a format

0

I have a series of data points (tuples) in a list with a format like:

points = [(1, 'a'), (2, 'b'), (2, 'a'), (3, 'd'), (4, 'c')]

The first item in each tuple is an integer and they are assured to be sorted. The second value in each tuple is an arbitrary string.

I need them grouped in lists by their first value in a series. So given an interval of 3, the above list would be broken into:

[['a', 'b', 'a', 'd'], ['c']]

I wrote the following function, which works fine on small data sets. However, it is inneficient for large inputs. Any tips on how to rewrite/optimize/mininize this so I can process large data sets?

def split_series(points, interval):
    series = []

    start = points[0][0]
    finish = points[-1][0]

    marker = start
    next = start + interval
    while marker <= finish:
        series.append([point[1] for point in points if marker <= point[0] < next])
        marker = next
        next += interval

    return series

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T17:51:22+00:00

For completeness, here’s a solution with itertools.groupby, but the dictionary solution will probably be faster (not to mention a lot easier to read).

import itertools
import operator

def split_series(points, interval):
    start = points[0][0]

    return [[v for k, v in grouper] for group, grouper in
            itertools.groupby((((n - start) // interval, val)
                               for n, val in points), operator.itemgetter(0))]

Note that the above assumes you’ve got at least one item in each group, otherwise it’ll give different results from your script, i.e.:

>>> split_series([(1, 'a'), (2, 'b'), (6, 'a'), (6, 'd'), (11, 'c')], 3)
[['a', 'b'], ['a', 'd'], ['c']]

instead of

[['a', 'b'], ['a', 'd'], [], ['c']]

Here’s a fixed-up dictionary solution. At some point the dictionary lookup time will begin to dominate, but maybe it’s fast enough for you like this.

from collections import defaultdict

def split_series(points, interval):
    offset = points[0][0]
    maxval = (points[-1][0] - offset) // interval
    vals = defaultdict(list)
    for key, value in points:
        vals[(key - offset) // interval].append(value)
    return [vals[i] for i in xrange(maxval + 1)]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a series of data points (tuples) in a list with a format

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply