Let’s say I have the following basic HTML page
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
<meta charset=utf-8 />
<title>JS Bin</title>
</head>
<body>
\u00f2
</body>
</html>
When the page renders, what I see is \u00f2 whereas I was expecting ò. And there comes the big “but”. With the following Javascript code, what I see is the ò character (2 seconds later).
$(function(){
window.setTimeout(function(){
$("body").html("\u00f2")},2000);
});
});
My question is, why is this happening? I am aware of rather than rendering the Unicode codepoints, I could convert them to HTML entities and render the correct character directly. The question is more for learning purposes.
Here is the jsbin
It happens because in HTML,
\u00f2is just a sequence of five characters; the backslash\never has any special meaning in HTML. In JavaScript strings, \u00f2 has a special meaning: it denotes the Unicode code unit with hexadecimal number 00f2, i.e. the chacter “ò”.Conversely, although you can use
òin HTML to denote “ò”, you cannot do that in JavaScript, though you could use functions that convertò(which is just a sequence of eight characters from the JavaScript point of view) to “ò”. Moreover, if your JavaScript code appears as embedded in HTML in ascriptelement or in an event attribute, then browsers may, depending on certain rules, first interpretòby HTML rules before invoking the JavaScript interpreter.In HTML documents, the modern, generally recommendable method is to enter the characters directly, using the UTF-8 encoding. You can do the same in JavaScript, too, e.g.
$("body").html("ò")},2000). However, this is sometimes avoided due to assumed or real complications in specifying the character encoding.