On the one hand if I have
<script>
var s = 'Hello </script>';
console.log(s);
</script>
the browser will terminate the <script> block early and basically I get the page screwed up.
On the other hand, the value of the string may come from a user (say, via a previously submitted form, and now the string ends up being inserted into a <script> block as a literal), so you can expect anything in that string, including maliciously formed tags. Now, if I escape the string literal with htmlentities() when generating the page, the value of s will contain the escaped entities literally, i.e. s will output
Hello </script>
which is not desired behavior in this case.
One way of properly escaping JS strings within a <script> block is escaping the slash if it follows the left angle bracket, or just always escaping the slash, i.e.
var s = 'Hello <\/script>';
This seems to be working fine.
Then comes the question of JS code within HTML event handlers, which can be easily broken too, e.g.
<div onClick="alert('Hello ">')"></div>
looks valid at first but breaks in most (or all?) browsers. This, obviously requires the full HTML entity encoding.
My question is: what is the best/standard practice for properly covering all the situations above – i.e. JS within a script block, JS within event handlers – if your JS code can partly be generated on the server side and can potentially contain malicious data?
The following characters could interfere with an HTML or Javascript parser and should be escaped in string literals:
<, >, ", ', \,and&.In a script block using the escape character, as you found out, works. The concatenation method (
</scr' + 'ipt>') can be hard to read.For inline Javascript in HTML, you can use entities:
Demo: http://jsfiddle.net/ThinkingStiff/67RZH/
The method that works in both
<script>blocks and inline Javascript is\uxxxx, wherexxxxis the hexadecimal character code.<–\u003c>–\u003e"–\u0022'–\u0027\–\u005c&–\u0026Demo: http://jsfiddle.net/ThinkingStiff/Vz8n7/
HTML: