I have a code that does something like this:
char16_t msg[256]={0};
//...
wstring wstr;
for (int i =0;i<len;++i)
{
if((unsigned short)msg[i]!=167)
wstr.push_back((wchar_t) msg[i]);
else
wstr.append(L"_<?>_");
}
as you can see it uses some rather ugly hardcoding(I’m not sure it works, but it works for my data) to figure out if wchar_t casting “failed”(that is the value of the replacement character)
From wiki:
The replacement character � (often a black diamond with a white
question mark) is a symbol found in the Unicode standard at codepoint
U+FFFD in the Specials table. It is used to indicate problems when a
system is not able to decode a stream of data to a correct symbol. It
is most commonly seen when a font does not contain a character, but is
also seen when the data is invalid and does not match any character:
So I have 2 questions:
1. Is there a proper way to do this nicely?
2. Are there other characters like replacement character that signal the failed conversion?
EDIT: i use gcc on linux so wchar_t is 32 bit, and the reason why I need this cast to work is because weird wstrings kill my glog library. 🙂 Also wcout dies. 🙁 🙂
Doesn’t work like that.
wchar_tandchar16_tare both integer types in C++. Casting from one to the other follows the usual rules for integer conversions, it does not attempt to convert between charsets in any way, or verify that anything is a genuine unicode code point.Any replacement characters will have to come from more sophisticated code than a simple cast (or could be from the original input, of course).
Provided that:
msgis a sequence of code points in the BMPwchar_tin your implementation is at least 16 bits and the wide character set used by your implementation is Unicode (or a 16-bit version of Unicode, whether that’s BMP-only, or UTF-16).Then the code you have should work fine. It will not validate the input, though, just copy the values.