On Google Chrome (Canary), it seems no string can make the DOM parser fail. I’m trying to parse some HTML, but if the HTML isn’t completely, 100%, valid, I want it to display an error. I’ve tried the obvious:
var newElement = document.createElement('div');
newElement.innerHTML = someMarkup; // Might fail on IE, never on Chrome.
I’ve also tried the method in this question. Doesn’t fail for invalid markup, even the most invalid markup I can produce.
So, is there some way to parse HTML “strictly” in Google Chrome at least? I don’t want to resort to tokenizing it myself or using an external validation utility. If there’s no other alternative, a strict XML parser is fine, but certain elements don’t require closing tags in HTML, and preferably those shouldn’t fail.
Use the
DOMParserto check a document in two steps:Loop through each element, and check whether the DOM element is an instance of
HTMLUnknownElement. For this purpose,getElementsByTagName('*')fits well.(If you want to strictly parse the document, you have to recursively loop through each element, and remember whether the element is allowed to be placed at that location. Eg.
<area>in<map>)Demo: http://jsfiddle.net/q66Ep/1/
See revision 1 of this answer for an alternative to XML validation without the DOMParser.
Considerations
nullfor<input type="text">, while it’s valid HTML5 (because the tag is not closed).