I wanted to remove all the tags in HTML file. For that I used re module of python.
For example, consider the line <h1>Hello World!</h1>.I want to retain only “Hello World!”. In order to remove the tags, I used re.sub('<.*>','',string). For obvious reasons the result I get is an empty string (The regexp identifies the first and last angle brackets and removes everything in between). How could I get over this issue?
I wanted to remove all the tags in HTML file. For that I used
Share
You can make the match non-greedy:
'<.*?>'You also need to be careful, HTML is a crafty beast, and can thwart your regexes.