I’m using re.findall() to extract some version numbers from an HTML file:
>>> import re >>> text = '<table><td><a href=\'url\'>Test0.2.1.zip</a></td><td>Test0.2.1</td></table> Test0.2.1' >>> re.findall('Test([\.0-9]*)', text) ['0.2.1.', '0.2.1', '0.2.1']
but I would like to only get the ones that do not end in a dot. The filename might not always be .zip so I can’t just stick .zip in the regex.
I wanna end up with:
['0.2.1', '0.2.1']
Can anyone suggest a better regex to use? 🙂
or, a bit shorter:
By the way – you do not need to escape the dot in a character class. Inside
[]the.has no special meaning, it just matches a literal dot. Escaping it has no effect.