I need to generate a “random” 128 byte key (Strength of randomness is not important at the moment). I do this in Javascript with the following code:
var buffer = "";
for(var i=0; i<128; i++)
{
buffer += String.fromCharCode(Math.round(Math.random()*255));
}
However, when I send this key to a PHP script via POST, I find that certain characters in my key do not have the same encoding! For example when I output the encoding of ò in Javascript I get 254, yet the same character has an encoding of 195 in PHP.
Certain characters, such as A-Z, a-z, and 0-9 have the same encoding in both Javascript and PHP.
To output character codes, I use Javascript’s .charCodeAt() method and PHP’s ord() function.
I was hoping someone could explain to me why the character encodings differ. Thank you!
Javascripts
.charCodeAt()returns the Unicode code point for each string character. Strings in Javascript use UCS-2 or UTF-16.PHP on the other hand only treats strings as streams of bytes. It doesn’t know much of charsets actually. Basically it considers strings ASCII or Latin-1 per default. (It’s binary-safe at least.)
Now parameters transferred via URL or form values usually get encoded as UTF-8. That will work in PHP, as UTF-8 was specifically designed to work with systems that are unaware of its existence.
The UTF-8 encoding of
òis"\xC3\xB2". So when you access the first character in PHP with$string[0]it will only see the first byte, which is hexC3or decimal195.There are the
mb_stringfunctions in PHP however to deal with UTF-8 etc, if you need it. (The workaround here is to convert the string from UTF-8 to UCS-2 and then extract the first word to get the Unicode code point. Or longwinded approaches like How to get code point number for a given character in a utf-8 string?)