A string = "aabbccaaabbcbbdbabdaaa";
How can that string be checked out in efficient way finding the inside string duplicates:
I mean:
-
Looking for 2-letters string in
string:aa = ” aa bbcc aa abbcbbdbabd aa a”;
//NO whitespaces here or elsewhere in thestring. Just added them to emphasize “aa”.
aa = “aa bbcca aa bbcbbdbabda aa “;
total aa = 5;
distance between aa = 4,5,11,12;bb = “aa bb ccaaa bb c bb dbabdaaa”;
total bb = 3;
distance between bb = 5,1
… -
Looking for 3-letters string in
string:aaa = ” aaa bbcc aaa bbcbbdbabd aaa “;
total aaa = 3;
distance between aaa = 4,10;
…
My attempt was in 4 cycles way and very slow.
P.S.
Any help is appreciated. Sorry for my English.
EDIT:
Sorry for bad question. I’ve forgotten to say that the string should also be checked for 4 chars duplicates and other-chars duplicates:
aabb = ” aabb cca aabb cbbdbabdaaa”;
total aabb = 2;
distance between aabb = 3;
EDIT 2:
The duplicates we are looking for should not be entered manually. Imagine that the string is 20k symbols and you’re searching for ANY duplicates (there no whitespaces) and the distance between those duplicates.
Thanks and sorry again for not correct question.
Here’s a solution in C#
I wrote this according to your comment…
I think this is working pretty well. It returns a Dictionary containing the string and the indices of each duplicate. In terms of performance, calling
IndexOf()so many times is probably the slowest part of this, but I don’t know any way around that.UPDATE
I changed the code to include the overlapping requirement.
UPDATE #2
I added a couple conditions where the algorithm will
breakout of the innerforloop. This improves performance quite a bit (especially when there are few duplicates to be found).