I have html-file. I have to replace all text between this: [%anytext%]. As I understand, it’s very easy to do with BeautifulSoup for parsing hmtl. But what is regular expression and how to remove&write back text data?
Okay, here is the sample file:
<html>
[t1] [t2] ... [tood] ... [sadsada]
Sample text [i8]
[d9]
</html>
Python script must work with all strings and replace [%] -> some another string, for example:
<html>
* * ... * ... *
Sample text *
*
</html>
What I did:
import re
import codecs
fullData = ''
for line in codecs.open(u'test.txt', encoding='utf-8'):
line = re.sub("\[.*?\]", '*', line)
fullData += line
print fullData
This code does exactly I described in sample. Thanks all.
Regex does the trick if you are needing to replace any text between “[%” and “%]”.
The code would look something like this:
The regex used here is lazy so it would match everything between an occurrence of “[%” and the next occurrence of “%]”. You could make it greedy by removing the question mark. This would match everything between the first occurrence of of “[%” and the last occurrence of “%]”