Take a look at the following C# code:
byte[] StringToBytesToBeHashed(string to_be_hashed) {
byte[] to_be_hashed_byte_array = new byte[to_be_hashed.Length];
int i = 0;
foreach (char cur_char in to_be_hashed)
{
to_be_hashed_byte_array[i++] = (byte)cur_char;
}
return to_be_hashed_byte_array;
}
(function above was extracted from these lines of code from the WMSAuth github repo)
My question is: What the casting from byte to char does in terms of Encoding?
I guess it really does nothing in terms of Encoding, but does that mean that the Encoding.Default is the one which is used and so the byte to return will depend on how the framework will encode the underlying string in the specific Operative System?
And besides, is the char actually bigger than a byte (I’m guessing 2 bytes) and will actually omit the first byte?
I was thinking in replacing all this by:
Encoding.UTF8.GetBytes(stringToBeHashed)
What do you think?
The .NET Framework uses Unicode to represent all its characters and strings. The integer value of a char (which you may obtain by casting to
int) is equivalent to its UTF-16 code unit. For characters in the Basic Multilingual Plane (which constitute the majority of characters you’ll ever encounter), this value is the Unicode code point.Casting a
chartobytewill result in data loss for any character whose value is larger than 255. Try running the following simple example to understand why:Yes, you should definitely use
Encoding.UTF8.GetBytesinstead.