I wrote a regular expression to validate strings that must adhere to the following rules:
- Must be at least one character
- Must contain no blank characters
- First character may not be punctuation
- Last letter may not be punctuation
- May not end in punctuation followed by digits
- All other characters may be any UTF-8 character other than
/[:@#].
Here is the regex:
my $name_re = qr/
[^[:punct:][:blank:]] # not punct or blank
(?: # followed by...
[^[:blank:]:@#]* # any number non-blank, non-@, non-#, non-@
[^[:punct:][:blank:]] # one not blank or punct
)? # ... optionally
/x;
See anything missing? Rule #5 is not enforced. I’ve been enforcing it by writing code such as this:
die "$proj is not a valid name" unless $proj =~ /\A$name_re\z/
&& $proj !~ /[[:punct:]][[:digit:]]+\z/;
There are a bunch of places I have to do this, so I would rather that it all be done in a single regular expression. The question is: how? What regular expression would reject a value such as “foo,23”?
The following should work:
Note that I moved the anchors inside of the regex, this may not be completely necessary with your current method but I think it makes it more clear to have it all in one place.
(?!...)and(?<!...)are negative lookaheads and lookbehinds, respectively. They make it pretty simple to verify things like this, essentially the middle section can be “match these valid characters”, with the lookahead/lookbehind at beginning and end to check those conditions.The negative lookahead in the middle verifies that at the given position, we cannot match to the end of the string with only punctuation or digits, or in other words it checks to make sure that condition 5 isn’t violated. Because this lookahead is within the repeated group, it is checked at every position.
This would be simpler if you could use a variable length lookbehind, but I don’t think Perl supports them.