After I implemented my sanitize functions (according to requested specifics), my boss decided to change the accepted input. Now he wants to keep some specific tag and its attributes. I suggested to implement a BBCode-like language which is safer imho but he doesn’t want to because it would be to much work.
This time I would like to keep it simple so I will not kill him the next time he asks me to change again this thing. And I know he will.
Is it enough to use first the strip_tags with the tag parameter to preserve and then htmlentities?
strip_tagsdoes not necessarily result in safe content.strip_tagsfollowed byhtmlentitieswould be safe, in that anything HTML-encoded is safe, but it doesn’t make any sense.Either the user is inputting plain text, in which case it should be output using
htmlspecialchars(in preference tohtmlentities), or they’re inputting HTML markup, in which case you need to parse it properly, fixing broken markup and removing elements/attributes that aren’t in a safe whitelist.If that’s what you want, use an existing library to do it (eg. htmlpurifier). Because it’s not a trivial task and if you get it wrong you’ve given yourself XSS security holes.