I have a problem with wchar_t* to char* conversion.
I’m getting a wchar_t* string from the FILE_NOTIFY_INFORMATION structure, returned by the ReadDirectoryChangesW WinAPI function, so I assume that string is correct.
Assume that wchar string is “New Text File.txt”
In Visual Studio debugger when hovering on variable in shows “N” and some unknown Chinese letters. Though in watches string is represented correctly.
When I try to convert wchar to char with wcstombs
wcstombs(pfileName, pwfileName, fileInfo.FileNameLength);
it converts just two letters to char* (“Ne”) and then generates an error.
Some internal error in wcstombs.c at function _wcstombs_l_helper() at this block:
if (*pwcs > 255) /* validate high byte */
{
errno = EILSEQ;
return (size_t)-1; /* error */
}
It’s not thrown up as exception.
What can be the problem?
In order to do what you’re trying to do The Right Way, there are several nontrivial things that you need to take into account. I’ll do my best to break them down for you here.
Let’s start with the definition of the
countparameter from thewcstombs()function’s documentation on MSDN:Note that this does NOT say anything about the number of wide characters in the wide character input string. Even though all of the wide characters in your example input string (“New Text File.txt”) can be represented as single-byte ASCII characters, we cannot assume that each wide character in the input string will generate exactly one byte in the output string for every possible input string (if this statement confuses you, you should check out Joel’s article on Unicode and character sets). So, if you pass
wcstombs()the size of the output buffer, how does it know how long the input string is? The documentation states that the input string is expected to be null-terminated, as per the standard C language convention:Though this isn’t explicitly stated in the documentation, we can infer that if the input string isn’t null-terminated,
wcstombs()will keep reading wide characters until it has writtencountbytes to the output string. So if you’re dealing with a wide character string that isn’t null-terminated, it isn’t enough to just know how long the input string is; you would have to somehow know exactly how many bytes the output string would need to be (which is impossible to determine without doing the conversion) and pass that as thecountparameter to makewcstombs()do what you want it to do.Why am I focusing so much on this null-termination issue? Because the
FILE_NOTIFY_INFORMATIONstructure’s documentation on MSDN has this to say about itsFileNamefield:The fact that the
FileNamefield isn’t null-terminated explains why it has a bunch of “unknown Chinese letters” at the end of it when you look at it in the debugger. TheFILE_NOTIFY_INFORMATIONstructure’s documentation also contains another nugget of wisdom regarding theFileNameLengthfield:Note that this says bytes, not characters. Therefore, even if you wanted to assume that each wide character in the input string will generate exactly one byte in the output string, you shouldn’t be passing
fileInfo.FileNameLengthforcount; you should be passingfileInfo.FileNameLength / sizeof(WCHAR)(or use a null-terminated input string, of course). Putting all of this information together, we can finally understand why your original call towcstombs()was failing: it was reading past the end of the string and choking on invalid data (thereby triggering theEILSEQerror).Now that we’ve elucidated the problem, it’s time to talk about a possible solution. In order to do this The Right Way, the first thing you need to know is how big your output buffer needs to be. Luckily, there is one final tidbit in the documentation for
wcstombs()that will help us out here:So the idiomatic way to use the
wcstombs()function is to call it twice: the first time to determine how big your output buffer needs to be, and the second time to actually do the conversion. The final thing to note is that as we stated previously, the wide character input string needs to be null-terminated for at least the first call towcstombs().Putting this all together, here is a snippet of code that does what you are trying to do:
Of course, don’t forget to call
delete[] pwNullTerminatedFileNameanddelete[] pFileNamewhen you’re done with them to clean up.ONE LAST THING
After writing this answer, I reread your question a bit more closely and thought of another mistake you may be making. You say that
wcstombs()fails after just converting the first two letters (“Ne”), which means that it’s hitting uninitialized data in the input string after the first two wide characters. Did you happen to use the assignment operator to copy oneFILE_NOTIFY_INFORMATIONvariable to another? For example,If you did this, it would only copy the first two wide characters of
someOtherFileInfo.FileNametofileInfo.FileName. In order to understand why this is the case, consider the declaration of theFILE_NOTIFY_INFORMATIONstructure:When the compiler generates code for the assignment operation, it does’t understand the trickery that is being pulled with
FileNamebeing a variable length field, so it just copiessizeof(FILE_NOTIFY_INFORMATION)bytes fromsomeOtherFileInfotofileInfo. SinceFileNameis declared as an array of oneWCHAR, you would think that only one character would be copied, but the compiler pads the struct to be an extra two bytes long (so that its length is an integer multiple of the size of anint), which is why a secondWCHARis copied as well.