I am learning the HTML parsing. During the tokenization stage, the byte stream is tokenized into tokens. How many token types does the standard HTML tokenization support? Does it include something like start tag token, comment token ?
Will the comments be considered as tokens and attached to the DOM tree?
The HTML specification says:
So there are six different tokens.
To answer your last question: comments are tokens and there is a DOM interface for them.