Let’s say I have the following code on my view page (in asp.net mvc 3 razor):
<a href='https://example.com/search?q=@Url.Encode(Model.UserInput)'>Click here</a>
Model.UserInput is a user input string that could contain any characters.
Is this totally safe, in terms of html injection and cross site scripting? Or should I HTML encode the query string after URL encoding it as well?
Of course, usually I’d eliminate dangerous input before this stage, but that’s not the point.
Maybe it is, maybe it isn’t. I would approach this problem with another angle, ignoring safety just for now…
URL encoding serves a purpose: percent-encoding (what it’s actual name is) a url. Imagine “url encoding” would replace all spaces with
<space width='1'>instead of the actual%20or whatever the heck it does now. The url “...?q=foo bar” would become, in our imaginary example, “...?q=foo<space width='1'>bar” and be a correctly “url encoded” url. This might be useful in a PDF or CSV file or whatever other type of output you’d be creating, but in HTML this would cause trouble. In your case because of the'which would “end” thehrefattribute leaving1'>as garbage.Because your output is intended for HTML you should actually, IMHO at least, do
HTMLEncode(URLEncode(MyUrl))(pseudocode).Remember this: escaping is always done within a specific context. For SQL you need some “mysql_real_escape”-alike stuff to escape quotes etc. to avoid SQL injection vulnarabilities. In HTML you need to escape characters like
"and<, in an RTF file you would need to escape even other strings/characters like (I don’t actually know)\would become\\or something similar, in a CSV file you’d need to escape,or;within a field value and in a JSON output you’d need a string containing a"to be escaped as\". Each type of output(format) needs it’s own escaping/encoding.What you are now doing is “nesting contexts”, you’re nesting a “url context” in an “HTML context”. So you’d have to escape/encode accordingly.
As TrueBlue demonstrates it is not safe.