I have a UTF-8 byte array of data. I would like to search for a specific string in the array of bytes in C#.
byte[] dataArray = (some UTF-8 byte array of data);
string searchString = "Hello";
How do I find the first occurrence of the word “Hello” in the array dataArray and return an index location where the string begins (where the ‘H’ from ‘Hello’ would be located in dataArray)?
Before, I was erroneously using something like:
int helloIndex = Encoding.UTF8.GetString(dataArray).IndexOf("Hello");
Obviously, that code would not be guaranteed to work since I am returning the index of a String, not the index of the UTF-8 byte array. Are there any built-in C# methods or proven, efficient code I can reuse?
Thanks,
Matt
One of the nice features about UTF-8 is that if a sequence of bytes represents a character and that sequence of bytes appears anywhere in valid UTF-8 encoded data then it always represents that character.
Knowing this, you can convert the string you are searching for to a byte array and then use the Boyer-Moore string searching algorithm (or any other string searching algorithm you like) adapted slightly to work on byte arrays instead of strings.
There are a number of answers here that can help you: