I want an efficient algorithm to find all occurrences of a pattern within a larger sequence.
For example, given the following input:
Pattern: GAS
Sequence: ASDFGASDFGASDFADFASDFGA
Expected Output: {4, 9}
According to accepted answer to a similar question implements an algorithm for achieving the desired task. However, one comment reports the algorithm is “slow on large bytes array”.
After reading around, it appears the best algorithm for doing this is the Boyer-Moore String search algrorithm with an implementation in C# on CodeProject but I’m having trouble implementing it for generic enumerables.
Is there any existing solution based on the Boyer-Moore algorithm to find all occurrences of a pattern in a generic sequence in .NET?
Note
Though I used strings in my example I want an answer that works on any data that implements IEnumerable. In other words it should work not only on strings but on any type at all.
After struggling in vain to comprehend the Boyer-Moore algorithm, I put together this code which does the pattern matching with a single pass over the larger collection.
I have not been able to test it against the Boyer-Moore algorithm but it works quite efficiently, with O(nm) as worst-case performance when the whole sequence is a repetition of the pattern.
Here is my implementation. Let me know your views on it.