I have the following Pig Latin filter:
filtered = FILTER raw BY year >= 1960 AND string MATCHES '(?!.*[0-9].*|.{1}|.*@.*|.*www.*|.*http.*)';
I was intending to get the following results for the following strings:
a #false .{1}
email@example.com #false .*@.*
http://somesite.com #false .*http.*
www.somesite.com #false .*www.*
12word #false .*[0-9].*
wo12rd #false .*[0-9].*
word12 #false .*[0-9].*
red #true
Instead, I get an empty result set.
EDIT:
I’ve updated the regex to:
'^(?!.*[0-9].*|.{1}|.*@.*|.*www.*|.*http.*)$'
after m.buettner’s correction, but continue to get an empty result set.
There are two problems. Firstly it seems like Pig Latin requires you to match the full string instead of “just a match somewhere within the string”. But you negative lookahead does not consume any characters, so it does not match the full the string. This could simply be resolved by appending
.*. Secondly your rule.{1}(where{1}is redundant) does not require this one character to be the only character in the string. So in your last example, it will simply consume therofredand set off the negative lookahead.Thus, here is the solution: