There are several functions that convert ANSI to Unicode and vice versa. Here are those functions WideCharToMultiByte, MultiByteToWideChar, A2W, W2A.
Now I don’t understand how A2W and W2A work. The thing is that when you convert something to another thing than you should have two sets set A and set B so that each element in set A is mapped to one and only one element in set B uniquely. Regarding this there are several problems:
-
ANSI is one byte and UNICODE is at least 2 byte which means not all elements in UNICODE set can be mapped to ANSI uniquely.
-
Set
ANSIand setUnicodeare not strictly defined. I mean there are different encoding for both.
Hereby, my question: how we can convert them and be sure that we have not spoiled the data?
As others have mentioned, there is no such character set as ‘ANSI’. Unfortunately, the Windows API refers to
CP_ACP, the ‘ANSI code page’, which refers to one of several character sets depending on which non-unicode locale is selected on your machine.That said, with regards to your original question, no, you cannot always round trip between
CP_ACPand a unicode encoding. There’s no equivalent for あ inCP_ACPon an English-locale windows system, for example.When this happens,
WideCharToMultiBytewill replace the character that has no equivalent withlpDefaultChar, if set, and set*lpUsedDefaultCharto true. You can pass a pointer to a boolean variable inlpUsedDefaultCharand check it after calling to see if your string contained non-translatable characters. The other direction,MultiByteToWideCharnever fails as long as the input is valid in your local codepage, however. To try to detect invalid text, pass in theMB_ERR_INVALID_CHARSflag and check for an error – that said, just because the text is in some other codepage, doesn’t mean you’ll get an error from it (it’s hard to tell if the text is actually invalid, or if it’s merely gibberish).