I’d like to think I’m pretty good at regular expressions, but this one is stumping me. I’m trying to match a certain type of language used in National Weather Service forecast bulletins. I’m using Perl 5.16 on Windows. I have also tested using this online regex tester. Here is an example message:
...A SEVERE THUNDERSTORM WARNING REMAINS IN EFFECT UNTIL 1130 PM CST FOR CENTRAL LAMAR COUNTY... AT 1106 PM CST...NATIONAL WEATHER SERVICE METEOROLOGISTS WERE TRACKING A SEVERE THUNDERSTORM CAPABLE OF PRODUCING PING PONG BALL SIZE HAIL...AND DESTRUCTIVE WINDS IN EXCESS OF 70 MPH. THIS STORM WAS LOCATED NEAR BAXTERVILLE MOVING EAST AT 50 MPH. THE SEVERE THUNDERSTORM WILL BE NEAR... PURVIS BY 1115 PM CST... WEST HATTIESBURG BY 1120 PM CST...
And here is my regex:
/A SEVERE THUNDERSTORM.+?(?<hsize>QUARTER|GOLF BALL|PING PONG BALL|HALF DOLLAR)?.+?WINDS (?:IN EXCESS OF|OVER) (?<wmph>\d+) MPH.+WAS LOCATED (?:(?<dist>\d+) MILES (?<dir>\w+) OF|(?<near>NEAR)) (?<loc>[\w ]+).+MOVING (?<mdir>\w+) AT (?<mph>\d+) MPH/
The problem is that the hsize parameter always returns blank. I would like it to be optional but greedy, however it never matches. I tried making it nonoptional:
/A SEVERE THUNDERSTORM.+?(?<hsize>QUARTER|GOLF BALL|PING PONG BALL|HALF DOLLAR).+?WINDS (?:IN EXCESS OF|OVER) (?<wmph>\d+) MPH.+WAS LOCATED (?:(?<dist>\d+) MILES (?<dir>\w+) OF|(?<near>NEAR)) (?<loc>[\w ]+).+MOVING (?<mdir>\w+) AT (?<mph>\d+) MPH/
Which does cause it to match, which makes no sense to me. As you can see, I’ve already made the wildcards nongreedy, so I don’t see what’s happening.
You can change a bit of your regex to force the engine to search for the special text before resorting to match anything. Change this part of the regex:
To:
The alternation will force the engine to exhaust all the possibility of finding a match with the special keywords (the first alternative), before go on to match anything (the 2nd alternative).