I have tried to find an answer to this in the W3C HTML specifications, but haven’t had any luck so far.
For example, if I have the following HTML code:
<body>
<p>
<foo>bar</foo>
</p>
</body>
Does W3C specify how a user agent should handle this? E.g should the “foo” element be completely ignored? Should the “foo” element be ignored but the content “bar” parsed?
Also, is it even “legal” to do this?
Edit: Some excellent answers from all of you! I totally agree that it would be bad practice to embed generic XML unless, possibly, if you have complete control over which browser your users will use. I was mostly curious about what actually would or should happen if such markup were to be produced 🙂
Some excellent advice from @Andy E. This is just some add-ons to that.
The HTML5 draft does define how to parse unknown elements, however, it is distinctly non-trivial. To see the rules, see http://dev.w3.org/html5/spec/tree-construction.html
Note that the first version of Firefox to use these rules is FireFox 4, and the first version of IE to use the rules is IE 10. Older versions have a number of different and often very strange behaviours.
HTML has no notion of “legality”, only validity or conformance to a standard. You are free to decide whether you want your pages to conform to any particular standard or not. There is no W3C standard of HTML where use of arbitrarily named elements is conforming.
It is generally advisable to make your HTML conforming to avoid unpredictable errors in browsers and other HTML consumers that you haven’t tested against.