The RLE (run length encoding) pattern seems to come up a lot in my work.
The essence of it is that you are outputting a reduction of the elements encountered since the last ‘break’ each time that you see a ‘break’ or you reach the end of the input.
(In actual RLE, the ‘break’ is just this character not matching the last character, but in the real world it’s usually a little more complex, but still a function of the current and last elements.)
I want to remove the duplicate last_val != None: rle.append((last_val, count)) condition and action which occur both in the loop and at the end.
The issues are:
- replacing them with function calls results in more code, not less.
- keeping it in imperative style (in Haskell, for example, the problem just evapourates).
The imperative Python code is:
#!/usr/bin/env python
data = "abbbccac"
if __name__ == '__main__':
rle = []
last_val = None
count = 0;
for val in data:
if val != last_val and last_val != None:
rle.append((last_val, count))
count = 1
else:
count += 1
last_val = val
if last_val != None:
rle.append((last_val, count))
print rle
P.S. Trivially solvable in functional languages:
#!/usr/bin/env runhaskell
import Data.List (group)
dat = "abbbccac"
rle :: Eq a => [a] -> [(a, Int)]
rle arr = map (\g -> (head g, length g)) $ group arr
main :: IO ()
main = print $ rle dat
Here is a more imperative form. You can eliminate your duplicate code by adding or chaining to a throwaway sentinel that will never match any of your list elements, forcing an end-of-sequence pass through your “this-not-equal-last” code:
This even gracefully handles the case where the input seq is empty, and the input can be any sequence, iterator, or generator, not just a string.