There is a frequently asked question in interviews about compressing a string.
I’m not looking for a code, I only need an efficient algorithm that solves the problem.
Given a string (e.g. aaabbccaaadd), compress it (3a2b2c3a2d).
My solution:
Travel on the string. Every time I see the same letter I count it.
I will output the letter and the counter when I see a different letter coming (and start over again).
Is there more efficient way to do this?
Thanks
That’s called running length encoding, and the algorithm you name is basically the best you’ll get. It takes O(1) auxiliary storage (save the last symbol seen, or equivalently inspect the upcoming element; also save a counter of how many identical symbols you’ve seen) and runs in O(n) time. As you need to inspect each symbol at least once to know the result, you can’t get better than O(n) time anyway. What’s more, it can also process streams one symbol at a time, and output one symbol at a time, so you actually only need O(1) RAM.
You can pull a number of tricks to get the constant factors better, but the algorithm remains basically the same. Such tricks include:
Such micro-optimizations may be moot if your data source is slow. For the level of optimization some of my points above address, even RAM can counts as slow.