I want to use DOMDocument to parse sting came from Rich-Text-Editor, exactly what I need are:
1) Allow only (div, p, span, b, ul, ol, li, blockquotem br) tags, remove others tags with its content
Edit:
I’m using strip_tags() for this
2) allow only these styles:
- style=”font-weight:bold”
- style=”font-style: italic”
- style=”text-decoration: underline”
3) remove any attributes in the allowed tags like class, id …etc except align attribute only
any ideas ?
I would recommend against trying to filter HTML input using DOMDocument for security reasons, in particular, due to the risk of cross-site scripting. You can easily take care of your requirements in 1 and 3 with a filter library like HTML Purifier. For the reasons Spudley mentions, number 2 is a little more difficult. I’d start by whitelisting those style attributes in HTML Purifier and then using some logic to scan for them after filtering, adding the appropriate tags inside that element.
Here’s an example for using HTML Purifier how you want (taken from basic.php). The only things I’ve changed are the HTML.AllowedAttributes and HTML.AllowedElements settings.
Which outputs: