The Boyer Moore algorithm has a a preprocessing time of Θ(m + |Σ|) and a matching time of Ω(n/m), O(n). I understand that Boyer Moore Horspool is an advancement of Simplified Boyer Moore itself, however its average case complexity is O(N) and worst case O(MN) according to this Wikipedia article. So in the worst case it should be slower than the Boyer Moore algorithm. But this classic survey by University of Chile shows that the Boyer-Moore horspool outperforms Boyer Moore almost every time. I am confused! Which one should I use (for small as well as large patterns) for string searching and which algorithm has a greater significance in the practical world (I am just a Computer science student)?
Share
The key word is “almost”. The worst-case behavior can be for a vanishingly small number of cases. Average behavior in real life and asymptotic behavior are also rather loosely coupled. The best case behavior of Boyer-Moore-Horspool is the same as for Boyer-Moore. The worst case for Boyer-Moore-Horspool is quite a bit worse than for Boyer-Moore. For typical use, Boyer-Moore-Horspool tends to be about the same as Boyer-Moore, but with a little better (lower) overhead and initialization costs.
Which one to use? It depends on your goals and what you expect in the way of patterns and text to be searched. Neither is particularly hard to implement, so why not do both and compare the results yourself. (See what happens when you admit that you’re a student? You get an assignment! :))