Say that you have a XHTML document in English but it has accented characters (e.g. meta name="author" content="José"). Let’s say you have no control over the HTTP headers.
-
Should the characters be replaced for their corresponding named entities (e.g.
á, etc)? -
Should the xml:lang attribute be set to English?
I know I can check the W3C recommendation but I am asking more from a practical point of view.
Since you can’t control the HTTP headers (and thus the declared character encoding) you should encode everything using ASCII (since it is a safe subset of just about everything).
This will require that you use entities for anything that isn’t in ASCII. Named ones are preferred (as they are easier for people editing the HTML to handle) but not required.
The
ENin the Doctype is a reference to the language that the comments in the DTD are written in. The HTML 3.x / 4.x and XHTML 1.x Doctypes must always useEN.The
langattribute (and additionally thexml:langattribute) should specify the language that the content is written in. If that is English, then it should be English.