I’m trying to figure out the safest way to retrieve unicode data in a unified method from remote computers and make sure that data stays consistent and readable.
Computer A: Chinese user, mixed English Windows 7, some registry values contain Chinese letters like L”您好”
Computer B: US English, no unicode values returned from my functions
Computer C: Introduces an agent to Computer A and B.
The agent: assesses the health and security of the computer from the inside. One unicode aware section is simply getting registry values i.e:
int Utilities::GetRegistryStringValue(HKEY h_sub_key, WCHAR* value_name, wstring &result)
{
DWORD cbData = 8;
LPDWORD type = NULL;
//Get the size and type of the key
long err = RegQueryValueEx(h_sub_key, value_name, NULL, type, NULL, &cbData);
if (err != ERROR_SUCCESS)
{
if (err != ERROR_FILE_NOT_FOUND)
debug->DebugMessage(Error::GetErrorMessageW(err));
return err;
}
result.resize(cbData / sizeof(WCHAR));
LPWSTR res = new WCHAR[(cbData + sizeof(L'\0')) / sizeof(WCHAR)];
err = RegQueryValueEx(h_sub_key, value_name, NULL, NULL, (LPBYTE) &res[0], &cbData);
if(err != ERROR_SUCCESS && err != ERROR_FILE_NOT_FOUND)
{
debug->DebugMessage(Error::GetErrorMessageW(err));
return err;
}
res[cbData / sizeof(WCHAR)] = L'\0';
result = wstring(res);
return ERROR_SUCCESS;
}
Those values will be stored in an XML file.
Should that XML file be in UTF16 or UTF8?
Am I going to need to pass the remote system’s code page back for translation?
What other issues might I have?
UTF8 is more standard (for networking) because it does not have endian issues. For UTF16 you’ll need to specify an endian-ness for the transmission. If you’re using a unicode format, you do not need a code page.
You can do the translation with standard windows calls like WideCharToMultiByte if they’re on windows machines.