When the parenthesis were used in the below program output is
['www.google.com'].
import re
teststring = "href=\"www.google.com\""
m=re.findall('href="(.*?)"',teststring)
print m;
If parenthesis is removed in findall function output is ['href="www.google.com"'].
import re
teststring = "href=\"www.google.com\""
m=re.findall('href=".*?"',teststring)
print m;
Would be helpful if someone explained how it works.
The
re.findall()documentation is quite clear on the difference:So
.findall()returns a list containing one of three types of values, depending on the number of groups in the pattern:(...)parenthesis): the whole matched string ('href="www.google.com"'in your second example).'www.google.com'in your first example).Use non-capturing groups (
(?:...)) if you don’t want that behaviour, or add groups if you want more information. For example, adding a group around thehref=part would result in a list of tuples with two elements each: