Most texts on the C++ standard library mention wstring as being the equivalent of string, except parameterized on wchar_t instead of char, and then proceed to demonstrate string only.
Well, sometimes, there are some specific quirks, and here is one: I can’t seem to assign a wstring from an NULL-terminated array of 16-bit characters. The problem is the assignment happily uses the null character and whatever garbage follows as actual characters. Here is a very small reduction:
typedef unsigned short PA_Unichar;
PA_Unichar arr[256];
fill(arr); // sets to 52 00 4b 00 44 00 61 00 74 00 61 00 00 00 7a 00 7a 00 7a 00
// now arr contains "RKData\0zzz" in its 10 first values
wstring ws;
ws.assign((const wchar_t *)arr);
int l = ws.length();
At this point l is not the expected 6 (numbers of chars in “RKData”), but much larger. In my test run, it is 29. Why 29? No idea. A memory dump doesn’t show any specific value for the 29th character.
So the question: is this a bug in my standard C++ library (Mac OS X Snow Leopard), or a bug in my code?
How am I supposed to assign a null-terminated array of 16-bit chars to a wstring?
Thanks
Under most Unixes (Mac OS X as well),
whar_trepresents UTF-32 single code point, and not 16bit utf-16 point like at windows.So you need to:
Either:
That would use arr as iterator and copy each short int to wchar_t.
But this would work only if your characters lay in BMP or representing UCS-2
(16bit legacy encoding).
Or, correctly work with utf-16: converting utf-16 to utf-32 — you need to find surrogate pairs and merge them to single code point.