I wrote a regular expression that parses a file path into different group (DRIVE, DIR, FILE, EXTENSION).
^((?<DRIVE>[a-zA-Z]):\\)*((?<DIR>[a-zA-Z0-9_]+(([a-zA-Z0-9_\s_\-\.]*[a-zA-Z0-9_]+)|([a-zA-Z0-9_]+)))\\)*(?<FILE>([a-zA-Z0-9_]+(([a-zA-Z0-9_\s_\-\.]*[a-zA-Z0-9_]+)|([a-zA-Z0-9_]+))\.(?<EXTENSION>[a-zA-Z0-9]{1,6})$))
I made a test in C#. When the path I want to test is correct. The result is very quick and this is what I wanted to expect.
string path = @"C:\Documents and Settings\jhr\My Documents\Visual Studio 2010\Projects\FileEncryptor\Dds.FileEncryptor\Dds.FileEncryptor.csproj";
=> OK
But when I try to test with a path that I know that will not match, like this :
string path = @"C:\Documents and Settings\jhr\My Documents\Visual Studio 2010\Projects\FileEncryptor\Dds.FileEncryptor\Dds.FileEncryptor?!??????";
=> BUG
The test freezes when I call this part of code
Match match = s_fileRegex.Match(path);
When i look into my Process Explorer, I see the process QTAgent32.exe hanging at 100% of my processor. What does it mean ?
The problem you are experiencing is called catastrophic backtracking and is due to the large number of ways that you regular expression can match the start of the string, which gives slow performance due to the backtracking regular expression engine in .NET.
I think you are using
*too frequently in your regular expression.*does not mean “concatenate” – it means “0 or more times”. For example there should not be a*here:There should be at most one drive specification. You should use
?instead here, or else no quantifier at all if you want the drive specification to be compulsory. Similarly there appear to be other places in your regular expression where the quantifier is incorrect.