I’ve been using Apache’s StringEscapeUtils for HTML entities, but if you want to escape HTML attribute values, is there a standard way to do this? I guess that using the escapeHtml function won’t cut it, since otherwise why would the Owasp
Encoder interface have two different methods to cope with this?
Does anyone know what is involved in escaping HTML attributes vs. entities and what to do about attribute encoding in the case that you don’t have the Owasp library to hand?
It looks like this is Rule #2 of the Owasp’s XSS Prevention Cheat Sheet. Note the bit where is says:
Therefore, I guess so long as the attributes are correctly bounded with double or single quotes and you escape these (i.e. double quote (“) becomes " and single quote (‘) becomes ' (or ')) then you should be ok. Note that Apache’s
StringEscapeUtils.escapeHtmlwill be insufficient for this task since it does not escape the single quote (‘); you should use the String’sreplaceAllmethod to do this.Otherwise, if the attribute is written:
<div attr=some_value>then you need to follow the recommendation on that page and..Not sure if there a non-Owasp standard implementation of this though. However, it guess it’s good practice not to write attributes in this manner anyway!
Note that this is only valid when you are putting in a standard attribute values, if the attribute is a
hrefor some JavaScript handler, then it’s a different story. For examples of possible XSS scripting attacks that can occur from unsafe code inside event handler attributes see: http://ha.ckers.org/xss.html.