The project I’m currently working on needs to interface with a client system that we don’t make, so we have no control over how data is sent either way. The problem is that were working in C#, which doesn’t seem to have any support for UCS-2 and very little support for big-endian. (as far as i can tell)
What I would like to know, is if there’s anything i looked over in .net, or something that someone else has made and released that we can use. If not I will take a crack at encoding/decoding it in a custom method, if that’s even possible.
But thanks for your time either way.
EDIT:
BigEndianUnicode does work to correctly decode the string, the problem was in receiving other data as big endian, so far using IPAddress.HostToNetworkOrder() as suggested elsewhere has allowed me to decode half of the string (Merli? is what comes up and it should be Merlin33069)
Im combing the short code to see if theres another length variable i missed
RESOLUTION:
after working out that the bigendian variables was the main problem, i went back through and reviewed the details and it seems that the length of the strings was sent in character counts, not byte counts (in utf it would seem a char is two bytes) all i needed to do was double it, and it worked out. thank you all for your help.
EDIT: Now we know that the problem isn’t in the encoding of the text data but in the encoding of the length. There are a few options:
BitConvertercode (which I assume is what you’re using now; that orBinaryReader)EndianBitConverterorEndianBinaryReaderclasses from MiscUtil, which are likeBitConverterandBinaryReader, but let you specify the endianness.You may be looking for
Encoding.BigEndianUnicode. That’s the big-endian UTF-16 encoding, which isn’t strictly speaking the same as UCS-2 (as pointed out by Marc) but should be fine unless you give it strings including characters outside the BMP (i.e. above U+FFFF), which can’t be represented in UCS-2 but are represented in UTF-16.From the Wikipedia page:
I find it highly unlikely that the client system is sending you characters where there’s a difference (which is basically the surrogate pairs, which are permanently reserved for that use anyway).