I have to write a program to remove all expressions of the form <word> and </word> where word is any sequence of letters (lower and upper case) and
Remove all expressions of the form <word ..... > and </word> where word is the same as before. For example, remove <a href=”wwang3.htm” class=”c l”>
Until now my code looks like this:
def remove_1( file_location ):
""""""
import re
file_variable = open( file_location )
lines = file_variable.read()
p = re.findall('<.*?>', lines)
print p
substitution = re.compile('<.*?>')
print substitution.subn( ' ', p )
I get an error that points to the print.substitution.subn( ' ', p) in which it says that I expected a string or buffer while running the program. Any help is greatly appreciated.
You are trying to substitute into the string “p”. However, p is the result of findall which is a list.
I would suggest doing it like this: