I would also like to know which algorithm has the worst case complexity of all for finding all occurrences of a string in another. Seems like Boyer–Moore’s algorithm has a linear time complexity.
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The KMP algorithm has linear complexity for finding all occurrences of a pattern in a string, like the Boyer-Moore algorithm¹. If you try to find a pattern like “aaaaaa” in a string like “aaaaaaaaa”, once you have the first complete match,
the border table contains the information that the next longest possible match (corresponding to the widest border of the pattern) of a prefix of the pattern is just one character short (a complete match is equivalent to a mismatch one past the end of the pattern in this respect). Thus the pattern is moved one place further, and since from the border table it is known that all characters of the pattern except possibly the last match, the next comparison is between the last pattern character and the aligned text character. In this particular case (find occurrences of am in an), which is the worst case for the naive matching algorithm, the KMP algorithm compares each text character exactly once.
In each step, at least one of
increases, and neither ever decreases. The position of the text character compared can increase at most
length(text)-1times, the position of the first pattern character can increase at mostlength(text) - length(pattern)times, so the algorithm takes at most2*length(text) - length(pattern) - 1steps.The preprocessing (construction of the border table) takes at most
2*length(pattern)steps, thus the overall complexity is O(m+n) and no morem + 2*nsteps are executed ifmis the length of the pattern andnthe length of the text.¹ Note that the Boyer-Moore algorithm as commonly presented has a worst-case complexity of O(m*n) for periodic patterns and texts like am and an if all matches are required, because after a complete match,
the entire pattern would be re-compared. To avoid that, you need to remember how long a prefix of the pattern still matches after the shift following a complete match and only compare the new characters.