I am trying to port a cgi script using pythonic style of coding.
sequence = "aaaabbababbbbabbabb"
res = sequence.split("a") + sequence.split("b")
res = [l for l in res if l]
The result is
>>> res
['bb', 'b', 'bbbb', 'bb', 'bb', 'aaaa', 'a', 'a', 'a', 'a']
This was ~100loc in C. Now i want to count the items with the same length in the res list efficiently. For example here res contains 5 elements with length 1, 3 elements with length 2 and 2 elements with length 4.
The problem is that the sequence string can be very big.
The easiest way to generate a histogram of string lengths given a list of strings is to use
collections.Counter:Edit: There is also a better way to find runs of equal characters, namely
itertools.groupby():