I want to strip all html tags from a string except some I specify.
If I call the constructor with default values everything works fine:
>>> cleaner = lxml.html.clean.Cleaner()
>>> cleaner.clean_html('''<i>italic</i><script>alert('');</script>''')
'<span><i>italic</i></span>'
But when I try to specify some tags, things doesn’t work anymore:
>>> allowed_tags = ['i','s']
>>> cleaner = lxml.html.clean.Cleaner(remove_unknown_tags=False,allow_tags=allowed_tags)
>>> cleaner.clean_html('''<i>italic</i><s>strike</s>''')
'<span></span>'
So what am i doing wrong?
As a workaround, you can add
spananddivtags toallowed_tags.UPD
lxml.html.Cleanertries to convert string to html tree by calling fromstring, which checks if document have some root node, and adds it if necessary. So you need to allowspananddivtags