I’ve noticed the OWASP recommends using a different encoding method specifically for encoding HTML Attributes, and in ASP.NET MVC there is a helper method specifically to encode attributes.
However, I haven’t been able to think of any situation where an HTML-encoded string wouldn’t work in the context of an HTML attribute. Are there cases where using standard HTML encoding would be insufficient or incorrect? If not, why are these extra methods provided in some frameworks?
(Note that not all string escaping frameworks provide such methods.)
When you take a deeper look into the reference implementation, the
encodeForHTMLAttributemethod calls theencodemethod of the HTMLEntityCodec class with a set of immune characters which do not need to be encoded. Inside theencodemethod, which is inherited from Codec class, you can see that any non-alphanumeric character, which is not in the immune set, would be encoded by a character reference.Now as you have already noticed that the immune sets for HTML and HTML attributes are different, especially in HTML attributes the space is not considered immune:
The reason for that is probably because HTML attributes do not necessarily need to be quoted. An when the quotes are missing, a literal space character would end the attribute value. In that case the space character needs to be encoded by a character reference to be interpreted as part of the value.