My goal is to take HTML entered by an end user, remove certain unsafe tags like <script>, and add it to the document. Does anybody know of a good Javascript library to sanitize html?
I searched around and found a few online, including John Resig’s HTML parser, Erik Arvidsson’s simple html parser, and Google’s Caja Sanitizer, but I haven’t been able to find much information about whether people have had good experiences using these libraries, and I’m worried that they aren’t really robust enough to handle arbitrary HTML. Would I be better off just sending the HTML to my Java server for sanitization?
You can parse HTML with jQuery, but I’m pretty sure any blacklist based (i.e. filtering out) approach to sanitizing is going to fail – you probably need a “filtering in” based approach and ultimately you don’t want to be relying on JavaScript for security anyway. In any case for reference you can use jQuery for DOM-parsing like this: