I have a script which looks for a file using a regular expression. The code was the following:
find $dir | grep "$regex"
The script run a bit too slow and I want to optimise it. The search takes some time to perform and I would like to get better performance out of it. I’ve tried this attempt:
find $dir -regex ".*${regex}.*"
I was expecting slightly faster results as no extra process is created to parse the regular expression.
However the result was different and to my astonishment the command “find | grep” is faster than “find -regex” (although it takes more system time, as one would have expected)
I’ve timed this behaviour:
Find | grep result
real 0m12.467s
user 0m2.568s
sys 0m7.260s
Find -regex result
real 0m16.778s
user 0m6.772s
sys 0m6.380s
Do you have any idea why the find -regex solution is slower?
Most likely because
grepand its regex engine has been highly optimized over many years, since that’s its only purpose (“do one thing and do it well”). I don’t know what regex enginefinduses, but it’s evidently not as highly refined asgrep‘s, probably because it’s a less-often-used secondary feature.Also, if you are doing anything with this file list, you should really use a more whitespace-safe way of doing this. I don’t think
grepcan take null-delimited input (though it can output it), so you should usefind [...] -regex [...] -print0even though it’s slower.