In my application, there are times when some text may or may not be html escaped (depending on where the data came from). I want to ensure the non-escaped text gets escaped, but the already escaped text doesn’t get escaped again.
How do people typically solve this?
You can’t tell from the data.
For example:
… could be “The HTML representation of
Bob & Alice” or it could also be “The plain text representation ofBob & Alice” (e.g. from an HTML tutorial).Since you say:
… keep track of where it comes from, and make sure you know if a source provides trusted HTML or plain text.
If you don’t know, then how you handle it will depend on the context. The safe option would be to assume it is always plain text and thus always encode it. That will protect you from scripting injection attacks.