For two days I have been researching for this and have not found anything so I decided to write my own string repetition detector. Basically the function
def findRepetitions (string):
would receive a string and search for any repetitions; returns a list of strings reduced to their simplest form.
For a sample, it’d be:
findRepetitions ("trololololo") --> ["olo"]
findRepetitions ("bookkeeper") ---> ["o", "k", "e"]
findRepetitions ("Hello, Molly") -> ["l", "l"]
findRepetitions ("abcdefgh") -----> []
findRepetitions ("102102102") ----> ["102"]
In the third example, the function returns [“l”, “l”] instead of [“ll”], because I want to search for repetitions only in the neighboring characters.
I know that this may be hard, but I’ve been literally thinking over this for a long time and cannot find any smart solution to this.
Your examples are inconsistent. For example,
olodoes not repeat, like the l inHello, Molly, in`trololololo; there’s anlbetween instances. Sequential repeats intrololololoarelolo,lo,olol, andol. Are you asking for a ‘greedy’ algorithm? So, giventrololololo, it would returnolol?In any case, here’s a bit of code.
If you want it to be ‘greedy’ like I described, you have to add in another function that takes the results from repeats and chomps away at your string when it finds a match.
For now, the results look like this:
warning
find_repetitionis not very quick, since it basically generates all length combinations of the string and throws them into a Counter object.