I’m looking for an algorithm to segment a sequence of positive numbers into n

Question

0

Asked: May 13, 20262026-05-13T14:08:22+00:00 2026-05-13T14:08:22+00:00

I’m looking for an algorithm to segment a sequence of positive numbers into n

0

I’m looking for an algorithm to segment a sequence of positive numbers into n subsequences, such that the standard deviation of the sum of the numbers in each subset is minimized.

The ordering of the numbers in each subsequence needs to be the same as the ordering in the original sequence

For example:

Suppose I have a sequence {1,1,1,1,1,1,10,1} that i wanted to segment into 2 subsequences.
I believe the optimal solution would be {1,1,1,1,1,1}, {10,1} .

The sum of the 1st subsequence is 6, the sum of the 2nd subsequence is 11
The standard deviation of the two numbers is ~3.5, which i believe is the lowest possible.

Suppose I have a sequence {4,1,1,1,1,6} that i wanted to segment into 3 subsequences.
I believe the optimal solution would be {4}, {1,1,1,1}, {6}
The sum of the subsequences is 4, 4, and 6.
The standard deviation of the 3 numbers is ~1.15, which i believe is the lowest possible.

The best algorithm i was able to come up with was to find the cumulative sum of each of the numbers in the sequence, and segment the sequence at each interval of [totalSum/numSubsequences].

For example, given the sequence {4,1,1,1,1,6} , the cumulative sums of the numbers of each sequence is {4,5,6,7,8,14}. The total of all numbers in the sequence is 14, so, given that i want 3 subsequences, i should segment the sequence when the total reaches 14/3 = 4.66 and 2 * 14/3 = 9.333333.

However, there is no actual place in the sequence where the cumulative total is equal to 4.66 – the first cumulative total is 4, and next cumulative total is 5. So should i round up or should i round down? In this case, rounding down to 4 gives the optimal solution, but that isn’t always the case. The best I can think of is to try every combination of rounding up and down, but that results in O(2^numSubsequences) complexity.

This seems to be the type of thing that would have a preexisting algorithm to apply, however my Googling has failed me. I am aware of the Partition Problem, which is NP-complete, but that deals with unordered sets, and not ordered sequences.

Any help would be appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T14:08:22+00:00

Suppose the length of the original sequence is L and the number of subsequences is N.

You may simplify the expression for standard deviation to get sqrt(E[X^2] - E[X]^2), where E denotes expectation/average and X denotes your random variable — in your case, the sum of the subsequences. (A similar formula applies for the “sample standard deviation”.) Note that E[X] does not depend on how you split your sequence, because it will always be the total sum divided by N. Thus, we just want to minimize E[X^2] or equivalently, the sum of X^2 (they differ by a factor of N by the definition of average).

At this point, we can see that this problem can be solved with dynamic programming. Let f(i,j), for i from 0 to M and j from 1 to N, be the minimal sum of squares of sums of subsequences from the split of the first i elements of your sequence into j subsequences. Then we see that f(i,j) may be computed in terms of all the f(i',j') with i' <= i and j < j'. More specifically, if your sequence is a[k] indexed from 0 to M-1:

f(i,1) = sum( a[k] for 0 <= k < i )^2
f(i,j) = minimum of  f(l,j-1)+sum( a[k] for l < k < i )^2  for l from 0 to i

Having minimized f(N,L), you can use standard dynamic programming techniques to recover the splits. In particular, you can store the l that minimizes f(i,j).

The runtime of this solution is O(L^2 N) because you compute O(L N) different values of f and the minimum is over O(L) different values of l.

Here’s a straightforward implementation in Perl:

#!/usr/bin/perl

use strict;
use warnings;

local $\ = $/;
print join ", ", map {"@$_"} best( 2, qw(1 1 1 1 1 1 10 1) );
# prints "1 1 1 1 1 1, 10 1"

print join ", ", map {"@$_"} best( 3, qw(4 1 1 1 1 6) );
# prints "4, 1 1 1 1, 6"

sub best {
    my( $N, @a ) = @_;

    my( @f, @g, $i, $j, $k, $sum );

    # DP base case
    $sum = 0;
    $f[0][1] = $g[0][1] = 0;
    for $i ( 1 .. @a ) {
        $sum += $a[$i-1];
        $f[$i][1] = $sum * $sum;
        $g[$i][1] = 0;
    }

    # DP recurrence
    for $j ( 2 .. $N ) {
        $f[0][$j] = $g[0][$j] = 0;
        for $i ( 1 .. @a ) {
            $sum = 0;
            $f[$i][$j] = $f[$i][$j-1];
            $g[$i][$j] = $i;
            for $k ( reverse 0 .. $i-1 ) {
                $sum += $a[$k];
                if( $f[$i][$j] > $f[$k][$j-1] + $sum * $sum ) {
                    $f[$i][$j] = $f[$k][$j-1] + $sum * $sum;
                    $g[$i][$j] = $k;
                }
            }
        }
    }

    # Extract best expansion
    my( @result );
    $i = @a; $j = $N;

    while( $j ) {
        $k = $g[$i][$j];
        unshift @result, [@a[$k .. $i-1]];
        $i = $k;
        $j--;
    }

    return @result;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m looking for an algorithm to segment a sequence of positive numbers into n

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply