I need to write effective and quick method to search byte array for given pattern.
I write it this way, what do you think , how to improve? And it has one bug, it cannot return match with length 1.
public static bool SearchByteByByte(byte[] bytes, byte[] pattern)
{
bool found = false;
int matchedBytes = 0;
for (int i = 0; i < bytes.Length; i++)
{
if (pattern[0] == bytes[i] && bytes.Length - i >= pattern.Length)
{
for (int j = 1; j < pattern.Length; j++)
{
if (bytes[i + j] == pattern[j])
{
matchedBytes++;
if (matchedBytes == pattern.Length - 1)
{
return true;
}
continue;
}
else
{
matchedBytes = 0;
break;
}
}
}
}
return found;
}
Any suggestions ?
The Boyer-Moore algorithm that is used in grep is pretty efficient, and gets more efficient for longer pattern sizes. I’m pretty sure you could make it work for a byte array without too much difficulty, and its wikipedia page has an implementation in Java that should be fairly easy to port to C#.
UPDATE:
Here’s an implementation of a simplified version of the Boyer-Moore algorithm for byte arrays in C#. It only uses the second jump table of the full algorithm. Based on the array sizes that you said (haystack: 2000000 bytes, needle: 10 bytes), it’s about 5-8 times faster than a simple byte by byte algorithm.