I am trying to port a cgi script using pythonic style of coding. sequence

Question

0

Asked: May 27, 20262026-05-27T05:26:57+00:00 2026-05-27T05:26:57+00:00

I am trying to port a cgi script using pythonic style of coding. sequence

0

I am trying to port a cgi script using pythonic style of coding.

sequence = "aaaabbababbbbabbabb"
res = sequence.split("a") + sequence.split("b")
res = [l for l in res if l]

The result is

>>> res
['bb', 'b', 'bbbb', 'bb', 'bb', 'aaaa', 'a', 'a', 'a', 'a']

This was ~100loc in C. Now i want to count the items with the same length in the res list efficiently. For example here res contains 5 elements with length 1, 3 elements with length 2 and 2 elements with length 4.

The problem is that the sequence string can be very big.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T05:26:58+00:00

The easiest way to generate a histogram of string lengths given a list of strings is to use collections.Counter:

>>> from collections import Counter
>>> a = ["a", "b", "aaa", "bb", "aa", "bbb", "", "a", "b"]
>>> Counter(map(len, a))
Counter({1: 4, 2: 2, 3: 2, 0: 1})

Edit: There is also a better way to find runs of equal characters, namely itertools.groupby():

>>> sequence = "aaaabbababbbbabbabb"
>>> Counter(len(list(it)) for k, it in groupby(sequence))
Counter({1: 5, 2: 3, 4: 2})

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to port a cgi script using pythonic style of coding. sequence

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply