I am facing very strange problem in which I have byte[] and when I am passing this to Convert.UTF8.GetString(byte[] bytes) method, the system encoding is messing with my bytes and replacing only few special bytes (which I am using as Markers in my system) to some three char string representation.
[0] 70 byte
[1] 49 byte
[2] 45 byte
[3] 86 byte
[4] 49 byte
[5] 253 byte <-- Special byte
[6] 70 byte
[7] 49 byte
[8] 45 byte
[9] 86 byte
[10]50 byte
[11]253 byte <-- Special byte
[12]70 byte
[13]49 byte
[14]45 byte
[15]86 byte
[16]51 byte
When I am passing above byte[] into Encoding.UTF8.GetString(bytes) method I am getting following output;
private Encoding _encoding = System.Text.Encoding.GetEncoding("UTF-8", new EncoderReplacementFallback("?"), new DecoderReplacementFallback("?"));
_encoding.GetString(bytes) "F1-V1�F1-V2�F1-V3" string
Actual value should not have ‘�’ as this means it failed to encode and replaced those special bytes with ‘�’. Is there anyway I can get around this i.e. convert to string and keep the special bytes representation to a single char.
I have following special bytes which I am trying to use as markers;
byte AM = (byte) 254
byte VM = (byte) 253
byte SM = (byte) 252
Your help and comments will be appreciated.
Thanks,
—
Sheeraz
The data is only UTF-8 between the markers, so if it were me I would be extracting the delimited portions first, and then UTF-8 decode each portion separately, i.e. read through the
byte[]looking for the markers in your binary data, giving you 3 binary chunks (70,49,45,86,49; 70,49,45,86,50; 70,59,45,86,51) which are then decoded into 3 strings. You can’t UTF-8 decode the entire binary sequence because it is not valid UTF-8.However, personally, I would say that using a delimiter is dangerous here; I would probably go for a length-prefix approach, so that
For example, if we used a “varint” length prefix, that would be:
where the
05is the “varint” length which we interpret as 5 bytes; this means we can process nicely: