I am facing issues in understanding Boyer Moore String Search algorithm.
I am following the following document. Link
I am not able to work out my way as to exactly what is the real meaning of delta1 and delta2 here, and how are they applying this to find string search algorithm.
Language looked little vague..
Kindly if anybody out there can help me out in understanding this, it would be really helpful.
Or, if you know of any other link or document available that is easy to understand, then please share.
Thanks in advance.
The algorithm is based on a simple principle. Suppose that I’m trying to match a substring of length
m. I’m going to first look at character at indexm. If that character is not in my string, I know that the substring I want can’t start in characters at indices1, 2, ... , m.If that character is in my string, I’ll assume that it is at the last place in my string that it can be. I’ll then jump back and start trying to match my string from that possible starting place. This piece of information is my first table.
Once I start matching from the beginning of the substring, when I find a mismatch, I can’t just start from scratch. I could be partially through a match starting at a different point. For instance if I’m trying to match
anandinananandsuccessfully match,anan, realize that the followingais not ad, but I’ve just matchedan, and so I should jump back to trying to match my third character in my substring. This, "If I fail after matching x characters, I could be on the y’th character of a match" information is stored in the second table.Note that when I fail to match the second table knows how far along in a match I might be based on what I just matched. The first table knows how far back I might be based on the character that I just saw which I failed to match. You want to use the more pessimistic of those two pieces of information.
With this in mind the algorithm works like this: