I have the following code, converted to C# from an old VB6 program. The VB6 had used the old Winsock, which could accept a String argument, but the C# program uses System.Net.Socket which wants a byte array.
byte[] msg = Encoding.UTF8.GetBytes(tempString);
_TCPConn.Send(msg);
tempString has
0x0002 (' ')
0x0000 ('\0')
0x0000 ('\0')
0x0000 ('\0')
0x0080 (' ')
0x006d ('m')
0x0068 ('h')
But msg gets an extra byte
0x02
0x00
0x00
0x00
**0xc2**
0x80
0x6d
0x68
Where is that “c2” coming from?
That is what UTF8 does. Values from 0x80 to 0x7FF get encoded with 2 bytes. Values from 0x800 to 0xFFFF get encoded with 3 bytes. 0xC2 0x80 tells the decoder to output just 0x80.
Edit: If the receiver is only expecting the low byte of each character and character values 0x80-0xFF are valid, you will have to convert each character one at a time.