I am trying to write a regex to catch email Ids . Testing since quite a few hours using regexpal.com . On the site, its able to catch all the email Ids. WHen I am substituting the same regex in Python and doing re.findall(pattern,line), it is not able to catch it.
Regex :
[a-zA-Z0-9-_]+[(.)?a-zA-Z0-9-_]*\s*(@|at)\s*[a-zA-Z0-9-_]+\s*(.|dot)\s*[a-zA-Z0-9-_]*\s*(.|dot)\s*e(\-)?d(\-)?u(\-)?(.,)?
Example :
Line = <TR> <TD><B>E-Mail: </B> <TD><A HREF=MailTo:*example.young@stackoverflow.edu*\>*example.young@stackoverflow.edu*</A>
(Highlighted correctly on regexpal.com).
With Python :
for line in f:
print 'Line = ',line
matches = re.findall(my_first_pat,line)
print 'Matches = ',matches
Gives output:
Line = <TR> <TD><B>E-Mail: </B> <TD><A HREF=MailTo:example.young@stackoverflow.edu>example.young@stackoverflow.edu</A>
Matches = [('@', 'd', '.', '', '', '', ''), ('@', 'd', '.', '', '', '', '')]
What is the issue ?
Read the documentation for
re.findall:Your groups only capture the at sign, dot, etc., therefore that’s all that’s returned by re.findall. Either use non-capturing groups, wrap the whole thing in a group, or use
re.finditer.(As noted by @Igor Chubin, your regex is also incorrectly using
.instead of\., but this isn’t causing the main problem.)