I’m writing the JS for a chat application I’m working on in my free time, and I need to have HTML identifiers that change according to user submitted data. This is usually something conceptually shaky enough that I would not even attempt it, but I don’t see myself having much of a choice this time. What I need to do then is to escape the HTML id to make sure it won’t allow for XSS or breaking HTML.
Here’s the code:
var user_id = escape(id)
var txt = '<div class="chut">'+
'<div class="log" id="chut_'+user_id+'"></div>'+
'<textarea id="chut_'+user_id+'_msg"></textarea>'+
'<label for="chut_'+user_id+'_to">To:</label>'+
'<input type="text" id="chut_'+user_id+'_to" value='+user_id+' readonly="readonly" />'+
'<input type="submit" id="chut_'+user_id+'_send" value="Message"/>'+
'</div>';
What would be the best way to escape id to avoid any kind of problem mentioned above? As you can see, right now I’m using the built-in escape() function, but I’m not sure of how good this is supposed to be compared to other alternatives. I’m mostly used to sanitizing input before it goes in a text node, not an id itself.
Never use
escape(). It’s nothing to do with HTML-encoding. It’s more like URL-encoding, but it’s not even properly that. It’s a bizarre non-standard encoding available only in JavaScript.If you want an HTML encoder, you’ll have to write it yourself as JavaScript doesn’t give you one. For example:
However whilst this is enough to put your
user_idin places like theinput value, it’s not enough foridbecause IDs can only use a limited selection of characters. (And%isn’t among them, soescape()or evenencodeURIComponent()is no good.)You could invent your own encoding scheme to put any characters in an ID, for example:
But you’ve still got a problem if the same
user_idoccurs twice. And to be honest, the whole thing with throwing around HTML strings is usually a bad idea. Use DOM methods instead, and retain JavaScript references to each element, so you don’t have to keep callinggetElementById, or worrying about how arbitrary strings are inserted into IDs.eg.:
You could also use a convenience function or JS framework to cut down on the lengthiness of the create-set-appends calls there.
ETA:
OK, then consider the jQuery 1.4 creation shortcuts, eg.:
You can keep a lookup of
user_idto element nodes (or wrapper objects) in JavaScript, to save putting that information in the DOM itself, where the characters that can go in anidare restricted.(The
_map_prefix is because JavaScript objects don’t quite work as a mapping of arbitrary strings. The empty string and, in IE, someObjectmember names, confuse it.)