I got a pretty simple regular expression I am using
%%(products?)%%
Now I want it to be able to match both products? and Products?. The obvious answer is to use the CASE_INSENSITIVE tag when compiling a pattern:
Pattern.compile("%%(products?)%%", Pattern.CASE_INSENSITIVE)
But on the documentation it says “Specifying this flag may impose a slight performance penalty.” I therefore thought of an alternative, without the flag:
Pattern.compile("%%([Pp]roducts?)%%")
My question is: Which one would have better performance?
Actually, there is a significant difference between the methods.
While
Pattern.compile("%%(products?)%%", Pattern.CASE_INSENSITIVE)might seem less efficient thanPattern.compile("%%([Pp]roducts?)%%")at first glance, it’s internal functioning is not exactly that of comparing each character with both their lower’ and uppercase counterparts; What actually happens is that the first method does a range-check with Unicode’s lower’ and uppercase blocks, while the second makes literal comparison.I don’t have knowledge much deeper than that, but the important part is this simple, but very interesting test (results on my machine included at the end):
As you can see, not only the first method is faster (at last depending on the machine it’s running), but also the performance difference of almost 800 ms (8/10 s), considering a large amount of runs, might not be as negligible an impact as one might think!