I want an option to convert a string to wide string with two different behaviors:
- Ignore illegal characters
- Abort conversion if illegal character occurs:
On Windows XP I could do this:
bool ignore_illegal; // input
DWORD flags = ignore_illegal ? 0 : MB_ERR_INVALID_CHARS;
SetLastError(0);
int res = MultiByteToWideChar(CP_UTF8,flags,"test\xFF\xFF test",-1,buf,sizeof(buf));
int err = GetLastError();
std::cout << "result = " << res << " get last error = " << err;
Now, on XP if ignore illegal is true characters I would get:
result = 10 get last error = 0
And in case of ignore illegal is false I get
result = 0 get last error = 1113 // invalid code
So, given big enough buffer it is enough to check result != 0 ;
According to documentation http://msdn.microsoft.com/en-us/library/dd319072(VS.85).aspx
there are API changes, so how does this changes on Vista?
I think what it does is replacing illegal code units by the replacement character (U+FFFD), as mandated by the Unicode standard. The following code
produces the following output on my Windows 7 system:
So the error codes stay the same, but the length is off by two, indicating the two replacement code points that have been inserted. If you run my code on XP, the fifth code point should be U+0020 (the space character) if the two illegal code units have been dropped.