If I need to check if for example a word A or word B exists in a text (String), is there a performance difference if I do:
if(text.contains(wordA) || text.contains(wordB))
to using some regular expression that searches the string?
Does it depend on the regular expression format?
Or is it just a matter of taste?
UPDATE:
If text.contains(wordA) is false then the text.contains(wordB) will be evaluated.
This means that contains will be called twice.
I was thinking if in performance terms a regex might be better than calling contains twice.
With this trivial example you shouldn’t see much of a performance difference, but purely from the algorithms involved the regular expression
would indeed be faster, as it just makes a single pass through the string and employs a finite automaton to match one of the two substrings. However, this is offset by building the finite automaton first, which should be pretty much linear in the length of the regex in this case. You can compile the regex first to have that cost only once as long as the compiled object lives.
So essentially cost comes down to:
if your text is very large and the substrings very small, then this could be worthwhile.
Still, you’re optimising the wrong place, most likely. Use a profiler to find the actual bottlenecks in your code and optimise those; don’t ever worry about such trivial “optimisations” unless you can prove them to make an impact.
One final thing to consider, though: With a regex you could make sure you’re actually matching words (or things that look like words) instead of word parts, which might be an actual reason to consider regex instead of
contains.