While searching for regular expressions used for email address validation, i came across this page: http://www.regular-expressions.info/email.html. i couldn’t understand it.
it says: \b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+.)+[A-Z]{2,4}\b will match john@server.department.company.com but not john@aol…com.
Can you explain how (?:[A-Z0-9-]+\.) works in detail and how it doesn’t match john@aol...com and matches the other one?
That’s because the appearance of a
.is only once, so multiple.will not be matched. For..or...etc to be matched, it would have to be\.+(the+means once or more, and is the same as{1,}The regex says
(?:[A-Z0-9-]+\.)+so it is one or more alphanumeric (or underscore), with a dot, and this whole thing can repeat once or more, soc.c.c.will match, butc..c.c.will not.The
(?: )is non-capturing, and is usually faster. You can use( )and it works as well, but just slower and the matched text will go into the capturing group.