my code is parsing some lines in a log file.
i do many things with this, but a particular need has come up to be able to find a line which does not contain a certain sub-string. under a certain condition
i have a pretty good understanding of regular expressions. but i cant seem to figure this one out.
the problem:
i want to capture any line which does not contain the word error or warn. unless it is the first part of the log entry and surrounded with square brackets.
so far, i have tried something like this:
(((?:abc|cba)\s+.*(?!\[?(?!error|warn)\]?).*)|((abc|cba)\s+\[(error|warn)\]\s+(.*)))
the lines in the log can look like some of these examples:
capture group 2:
abc [error] message
cba [error] message
cba [warn] message
capture group 1:
abc something random
cba i dont know
don’t capture:
abc some [error] message
cba some [warn] message
the problem in simpler English; I want to get any line which starts with abc or cba. capture group 1 should grab the line if it doesn’t have [error] or [warn] anywhere in it. and capture group 2 should get it only if [error] or [warn] are the first part of the entry (after the abc or cba)
This should do the trick:
Note that I assert the whole line to match the regex with
^and$.I first check for
abcandcbastarting the line.Then 2 cases:
[error]nor[warn]appear anywhere in the line:(?!.*(?:\[error\]|\[warn\]))(The?:is not very important, just non-capturing group).[error]or[warn]follow right afterabcandcba:\s*(?:\[error\]|\[warn\]). Note that you may want to change\s*to\s+, since current regex will matchabc[error].Then the rest I don’t care
.*, but it needs to be there, since I used$. I’m not totally sure about Python: check whether you can remove.*$part of the regex.I make all groups non-capturing, since you seem to be asserting that the line follow certain format. If you need to extract some data from the line at the same time, let me know.