I have a html file that contains a line:
a = '<li><a href="?id=11&sort=&indeks=0,3" class="">H</a></li>'
When I search:
re.findall(r'href="?(\S+)"', a)
I get expected output:
['?id=11&sort=&indeks=0,3']
However, when I add “i” to the pattern like:
re.findall(r'href="?i(\S+)"', a)
I get:
[ ]
Where’s the catch?
Thank you in advance.
The problem is that the
?has a special meaning and is not being matched literally.To fix, change your regex like so:
Otherwise, the
?is treated as the optional modified applied to the". This happens to work (by accident) in your first example, but doesn’t work in the second.