Trying to archive a regexp to filter bounced emails differing them from SPAM or temporary undeliverable.
Our idea is to grab certain words the expression could contain (code + word) but ignore the whole line if it contans others such as (SPAM|temporarily undeliverable|disk quota exceeded) etc, as this would not be considered permanent bounces. We’ve managed the first part and found a couple of answers here about negative regexp (http://stackoverflow.com/questions/1153856/string-negation-using-regular-expressions) but been completely unsuccessful in mixing both in one group sentence so far.
Something like:
.*(5.3.0|5.1.0).*(User unknown|invalid|Unknown address|doesn't have a)
but not match if anywhere else on the same line contains xxx words. Something like:
^(?!(SPAM|temporarily undeliverable|disk quota exceeded)).*$
So the following first line would match but the second should not
Diagnostic-Code: smtp; 5.3.0 – Other mail system problem 554-“delivery
error: dd This user doesn’t have a btinternet.com account
(xxxxxxxx@xxxxxinternet.com) [0] – mta1000.bt.mail.ird.yahoo.com”
(delivery attempts: 0)Diagnostic-Code: smtp; 5.1.0 – Unknown address error 550-‘RCPT
TO: Mailbox disk quota exceeded’ (delivery
attempts: 0)
You are searching only at the start of the string for your negation. You just need to add a
.*try
See it here on Regexr