Say I only need to find out whether a line read from a file contains a word from a finite set of words.
One way of doing this is to use a regex like this:
.*\y(good|better|best)\y.*
Another way of accomplishing this is using a pseudo code like this:
if ( (readLine.find("good") != string::npos) ||
(readLine.find("better") != string::npos) ||
(readLine.find("best") != string::npos) )
{
// line contains a word from a finite set of words.
}
Which way will have better performance? (i.e. speed and CPU utilization)
The regexp will perform better, but get rid of those ‘.*’ parts. They complicate the code and don’t serve any purpose. A regexp like this:
will search through the string in a single pass. The algorithm it builds from this regexp will look first for \y, then character 1 (g|b), then character 2 (g => go or b => be), character 3 (go => goo or be => bes|bet), character 4 (go => good or bes => best or bet => bett), etc. Without building your own state machine, this is as fast as it gets.