I have html stored in the database and I need to output it to the page.
- If I don’t escape() it, then I get the bold formatting I want, but I run the risk of getting an XSS from the unescaped html source.
- If I escape() it, then it shows the raw html code
<b>bold text</b>instead of bold text.
How can I escape everything, except some tags? I’m thinking to apply the escape(), then search for the <b> and </b> and unescape them. Would that work? Any security problems you see with it? I’m also not sure how I would search for the <b></b> tags. Regex for that maybe or what?
P.S. the escape() I mean is a function in Zend. I believe it’s the equivalent of htmlspecialchars().
Unescaping is the way to go. If you only whitelist a couple of tags to be converted back from the html escapes, then you won’t run into XSS exploits.
Workaround markups provide no advantage regarding that, as the many failed BBcode parsers prove.
(Instead of converting back and forth it might however be sensible to utilize HTMLPurifier instead.)