I have lately been trying to read about how everything with strings and encodings work.
My question is this, the method:
public static byte[] Convert(
Encoding srcEncoding,
Encoding dstEncoding,
byte[] bytes
)
What is actually going on behind the scenes, is it using a StringBuilder to check each char and then replacing them according to the specified Encoding or what?
I’d expect it to be effectively:
Now it may do it in a more memory-efficient way than that – but effectively it needs to decode the original binary data and encode it again as binary data in the other encoding.
Note that performing encoding on a character-by-character basis doesn’t always work – for example, one UTF-8 byte sequence may decode to a single Unicode code point represented as a surrogate pair of UTF-16 code units (
charvalues). Using anEncoderand Decoderpair would allow "chunks" of data to be encoded/decoded at a time, removing the need for the whole text data to be in memory at one time... possibly writing to aMemoryStream` or something similar to store the encoded data.