I’m seeing some weird behavior when I’m setting the title of an HTML page using JavaScript. If I insert html character references directly into the title the Unicode renders correctly, for instance:
<title>吧出</title>
But if I attempt to use html characters references via JavaScript, something seems to be converting the & to (& amp 😉 (separating them so SO doesn’t just turn it back into ampersand) and thus breaking the encoding, causing it to be rendered as the full coded string:
function execTitleChange() {
document.title = "吧出";
}
(I should note that this is a little bit of speculation; when I introspect the DOM using Firebug after executing this JavaScript function, that’s where I see the & instead of &.)
If I use \u encoded Unicode characters when setting the value from JavaScript then everything works correctly again:
function execTitleChange() {
document.title = "\u5427\u51fa";
}
The fact that \u encoded characters work kind of makes sense to me since I think that’s how JavaScript represents Unicode characters but I’m stumped as to why the behavior would be different when using the html character references.
JavaScript string constants are parsed by the JavaScript parser. Text inside HTML tags is parsed by the HTML parser. The two languages (and, by extension, their parsers) are different, and in particular they have different ways of representing characters by character code.
Thus, what you’ve discovered is the way reality actually is 🙂 Use the
\uescape notation in JavaScript, and use HTML entities (&#nnnn;) in HTML/XML.edit — now the situation can get even more confusing when you’re talking about creating/inserting HTML from JavaScript. When you use
.innerHTMLto update the DOM from JavaScript, then you are basically handing over HTML source code to the HTML parser for interpretation. For that reason, you can use either JavaScript\uescapes or HTML entities, and things will work (excepting painful issues of character encoding mismatches etc).Finally, note that JavaScript also provides the
String.fromCharCode()function to construct strings from numeric character codes.