I have a text file with ; used as the delimiter. The problem is that it has some html text formatting in it such as > Obviously the ; in this causes problems.
The text file is large and I don’t have a list of these html strings, that is there are many different examples such as $amp;. How can I remove all of them using python.
The file is a list of names, addresses, phone number and a few more fields. I am looking for the crap.html.remove(textfile) module
I have a text file with ; used as the delimiter. The problem is
Share
The quickest way is probably to use the undocumented but so far stable
unescapemethod in HTMLParser:Note this will necessarily output a Unicode string, so if you have any non-ASCII bytes in there you will need to
s.decode(encoding)first.