I’m working on a web framework and am trying to build XSS prevention into it. I have set it up so it will escape incoming data for storage in the database, but sometimes you want to save html that the user generates. I am trying to make a custom tag that will prevent any javascript from executing, here is my first hack at it:
<html>
<head>
<script type="text/javascript" src="/js/jquery.min.js"></script>
</head>
<body>
<preventjs>
<div id="user-content-area">
<!-- evil user content -->
<p onclick="alert('evil stuff');">I'm not evil, promise.</p>
<p onmouseover="alert('evil stuff');">Neither am I.</p>
<!-- end user content -->
</div>
</preventjs>
<script type="text/javascript">
// <preventjs> tags are supposed to prevent any javascript events
// but this does not unbined DOM events
$("preventjs").find("*").unbind();
</script>
</body>
</html>
I tried using jQuery to unbind everything, but it doesn’t unbind events in the DOM, which is exactly what I’m trying to do. Is it possible to unbind all events for a DOM element?
You’re problem is that you are doing this on the wrong end of things — you should be filtering all user input of potentially hostile content when you receive it.
The first rule of thumb when doing this is “always whitelist, never blacklist”. Rather than allowing any and all attributes in your user-generated HTML, simply keep a list of allowed attributes and strip away all others when you receive the HTML (possibly on the client side — definitely on the server side.)
Oh, and HTML is not a regular language. You’ll want to use an HTML parser, not a regular expression for this task.