Im writing a python program using regex to find email addresses. re.findall function is giving wrong output whenever I try to use round brackets for grouping. Can anyone point out the mistake / suggest an alternate solution?
Here are two snippets of code to explain –
pat = "[\w]+[ ]*@[ ]*[\w]+.[\w]+"
re.findall(pat, 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com ')
gives the output
['abc@cs.stansoft', 'myacc@gmail.com']
However, if I use grouping in this regex and modify the code as
pat = "[\w]+[ ]*@[ ]*[\w]+(.[\w]+)*"
re.findall(pat, 'abc@cs.stansoft.edu.com .rtrt.. myacc@gmail.com ')
the output is
['.com', '.com']
To confirm the correctness of the regex, I tried this specific regex (in second example) in http://regexpal.com/ with the same input string, and both the email addresses are matched successfully.
In Python,
re.findallreturns the whole match only if there are no groups, if there are groups then it will return the groups. To get around this, you should use a non-capturing group(?:...). In this case: