I thought I had the answer to my question when, on w3.org, I read:
The element containing the character encoding declaration must be serialized completely within the first 512 bytes of the document.
Then after being linked to whatwg.org by the html5boilerplate docs [http://cl.ly/K7Vt] and reading this, I wasn’t sure which was correct:
The element containing the character encoding declaration must be serialized completely within the first 1024 bytes of the document.
Which is correct?
The first document quoted is a non-normative W3C document aimed at describing HTML5 as a markup language in an understandable way. The second one is a WHATWG document that purports to be a “living standard” (an oxymoron) but is mostly in accordance with the W3C HTML5 document, which is a working draft, which means that it is nowhere near official (not necessarily endorsed even by all working group members), but meant to lead to the creation of a “standard” (W3C Recommendation) . In this case, the latter says the same as the WHATWG document.
It seems to me that in W3C drafts, the number was raised from 512 to 1024 in the 05 April 2011 draft. The non-normative document presumably just hasn’t been updated in this respect.
So, there is no standard, even in the loose W3C sense, so there is no definitive criterion for correctness. But apparently 1024 is supposed to be the lower bound. It should be understood as advice to authors; browsers may in fact apply a more liberal strategy.
In practical terms, the
metatag specifying the encoding should appear before any other elements of theheadpart. Even if you use verbose markup with<html>and<head>tags, you will be nowhere near even the 512 characters limit. If you have loads of comments at the start of the HTML document, just remove them; there are better places for documentation.