I’ve got a working regular expression, but I’d like to make it a tad more readable, and I’m far from a regex guru, so I was humbly hoping for some tips.
This is designed to scrape the output of several different compilers, linkers, and other build tools, and is used to build a nice little summery report. It does it’s job great, but I’m left feeling like I wrote it in a clunky fashion, and I’d sooner learn than keep it the wrong way.
(.*?)\s?:?\s?(informational|warning|error|fatal error)?\s([A-Z]+[0-9][0-9][0-9][0-9]):\s(.*)$
Which, broken down simply, is as follows:
(.*?) # non-greedily match up until...
\s?:?\s? # we come across a possible " : "
(informational|warning|error|fatal error)? # possibly followed by one of these
\s([A-Z]+[0-9][0-9][0-9][0-9]):\s # but 100% followed by this alphanum
(.*)$ # and then capture the rest
I’m mostly interested in making the 2nd and 4th entry above more… beautiful. For some reason, the regex tester I was using (The Regulator) didn’t match plain spaces, so I had to use the \s… but it is not meant to match any other whitespace.
Any schooling will be greatly appreciated.
Line 2
I think your regular expression doesn’t match with the comment. You probably want this instead:
To make it non-capturing:
You should be able to use a literal space instead of
\s. This must be a restriction in the tool you are using.Line 4
[0-9][0-9][0-9][0-9]can be replaced with[0-9]{4}.In some languages
[0-9]is equivalent to\d.