I have written a personal web app that uses charCodeAt() to convert text that is input by the user into the relevant character codes (for example ⊇ is converted to 8839 for storage), which is then sent to Perl, which sends them to MySQL. To retrieve the input text, the app uses fromCharCode() to convert the numbers back to text.
I chose to do this because Perl’s unicode support is very hard to deal with correctly. So Perl and MySQL only see numbers, which makes life a lot simpler.
My question is can I depend on fromCharCode() to always convert a number like 8834 to the relevant character? I don’t know what standard it uses, but let’s say it uses UTF-8, if it is changed to use UTF-16 in the future, this will obviously break my program if there is no backward compatibility.
I know that my ideas about these concepts aren’t that clear, therefore please care to clarify if I’ve shown a misunderstanding.
fromCharCodeandtoCharCodedeal with Unicode code points, i.e. numbers between 0 and 65535(0xffff), assuming all characters are in the Basic-Multilingual Plane(BMP). Unicode and the code points are permanent, so you can trust them to remain the same forever.Encodings such as UTF-8 and UTF-16 take a stream of code points (numbers) and output a byte stream. JavaScript is somewhat strange in that characters outside the BMP have to be constructed by two calls to
toCharCode, according to UTF-16 rules. However, virtually every character you’ll ever encounter (including Chinese, Japanese etc.) is in the BMP, so your program will work even if you don’t handle these cases.One thing you can do is convert the numbers back into bytes (in big-endian int16 format), and interpret the resulting text as UTF-16. The behavior of
fromCharCodeandtoCharCodeis fixed in current JavaScript implementations and will not ever change.