I have a string that is about a thousand characters long composed of L’s, T’s, and A’s. I’m pretty sure there is a simple pattern in it and I’m wondering if there is any quick and easy way of going about finding it. This string changes so that this is not just a one off.
The pattern I’m looking for is for example if the string was
LLLLLLLLAATAALLLLALLLLLLAATAALLLATLLLLLAATAALLAALLLLLAATAALL
The substring LLLLLAATAALL repeats 4 times in this string. I want to search for substrings like this but I don’t know where they start, end, how many there are, and how long they are in the main string.
If there are any tools in Java for looking for this kind of thing any advice would be much appreciated.
nt
Ok, so I took the code from here and adapted it to keep track of and print the longest repeated substring. Just run it using JUnit.
Output:
EDIT: response to your comments.
Currently I simply keep track of the longest repeating SuffixTreeNode (it’s a field in AbstractSuffixTree). You could modify this so it keeps track of a SortedQueue of nodes, ordered by their stringDepth.