I know that < is not used inside an HTML tag. I know that it is illegal in URIs too. I can deduce that it is also invalid in class or id of an HTML element. I have never seen it used in styles either.
I need to know however if there is any weird special case that < could be entered.
Let me elaborate. I want to match HTML tags in some text and say, throw them away. In this text, < is normally escaped (so written like \<), but I am assuming there could be user input mistake. So, I want to see if I there is ever a chance that such a thing could be a tag:
<.........<......> <-- the whole thing is a tag
(where the .s could be anything, such as ")
Or could I safely assume the first < was a mistake by the user?
The question sounds a bit too specific, so I’m going to make it a bit more general: What are the characters that absolutely cannot appear inside an html tag?
Is perfectly valid HTML.