While writing an encryption method in JavaScript, I came to wondering what character encoding my strings were using, and why.
What determines character encoding in JavaScript? Is it a standard? By the browser? Determined by the header of the HTTP request? In the <META> tag of HTML that encompasses it? The server that feeds the page?
By my empirical testing (changing different settings, then using charCodeAt on a sufficiently strange character and seeing which encoding the value matches up with) it appears to always be UTF-8 or UTF-16, but I’m not sure why.
After some frantic googling, I couldn’t seem to find a conclusive answer to this simple question.
Section 8.4 of E262:
That wording is kind of weaselly; it seems to mean that everything that counts treats strings as if each character is a UTF-16 character, but at the same time nothing ensures that it’ll all be valid.
To be clear, the intention is that strings consist of UTF-16 code points. In ES2015, the definition of "string value" includes this note:
So a string is still a string even when it contains values that don’t work as correct Unicode characters.