The main function that builds the failure/prefix table in KMP as I have seen (in all online resources even in this answer_ is as follows:
int j = 0;
for (int i = 1; i < pattern.length(); i++) {
while (j > 0 && pattern.charAt(j) != pattern.charAt(i)) {
j = failure[j - 1];
}
if (pattern.charAt(j) == pattern.charAt(i)) {
j++;
}
failure[i] = j;
}
I can’t understand this part:
j = failure[j - 1];
Why don’t we do j-- instead to go back the string? How do we know that it is correct to update j using the failed table?
KMP string matching is correct if the i’th entry of the failure table is the length of the longest suffix-prefix match of the prefix of the pattern of length i.
Notice that if
A[0..k]is a suffix-prefix match ofA[0..i], thenA[0..k]is either the longest suffix-prefix match ofA[0..i]or it’s a suffix-prefix match of said longest suffix-prefix match.When you put these two things together, it turns out that we want
failure[i]to be the length of the longest suffix-prefix match ofpattern[0..i]. KMP preprocessing buildsfailureinductively, using the following fact:If the longest suffix-prefix match of
A[0..i]is nonempty, chopping off the last character will give some suffix-prefix match ofA[0..i-1].Consequently, the longest suffix-prefix match of
A[0..i]is either empty or it’s formed by extending the longest suffix-prefix match ofA[0..i-1]by one character or by extending a suffix-prefix match of that longest suffix-prefix match by one character. The failure function for the preceding characters gives you a straightforward way to iterate over all suffix-prefix matches ofpattern[0..i-1].