I have been reading that you HTML encode on the way back from the server to the client (I think?) and this will prevent many types of XSS attacks. However, I don’t understand at all. The HTML is still going to be consumed and rendered by the browser right?
How is this stopping anything?
I’ve read about this in multiple locations, websites and books, and nowhere does it actually explain why this works.
Think about it: What does encoded HTML look like? For example, it could look like this:
So it will be rendered on the client as the literals (as <a href=”www.stackoverflow.com”>), not as HTML. Meaning you won’t see an actual link, but the code itself.
XSS attacks work on the basis that someone can make a client browser parse HTML that the site provider didn’t intend to be on there; if the above weren’t encoded, it would mean that the provided link would be embedded in the site, although the site provider didn’t want that.
XSS is of course a little more elaborate than that, and usually involves JavaScript as well (hence the Cross Site Scripting), but for demonstration purposes this simple example should suffice; it’s the same with JavaScript code as with simple HTML tags, since XSS is a special case of the more general HTML injection.