I have the lovely functions from my previous question, which work fine if I do this:
wstring temp;
wcin >> temp;
string whatever( toUTF8(getSomeWString()) );
// store whatever, copy, but do not use it as UTF8 (see below)
wcout << toUTF16(whatever) << endl;
The original form is reproduced, but the in between form often contains extra characters. If I enter for example àçé as the input, and add a cout << whatever statement, i’ll get ┬à┬ç┬é as output.
Can I still use this string to compare to others, procured from an ASCII source? Or asked differently: if I would output ┬à┬ç┬é through the UTF8 cout in linux, would it read àçé? Is the byte content of a string àçé, read in UTF8 linux by cin, exactly the same as what the Win32 API gets me?
Thanks!
PS: the reason I’m asking is because I need to use the string a lot to compare to other read values (comparing and concatenating…).
Let’s start by me saying that it appears that there is simply no way to output UTF-8 text to the console in Windows via
cout(assuming you compile with Visual Studio).What you can do however for your tests is to output your UTF-8 text via the Win32 API fn
WriteConsoleA:This should output:
Umlaut AE = Ä / ue = üif you set your console (cmd.exe) to use the Lucida Console font.As for your question (taken from your comment) if
I will say yes: Given a Unicode character sequence, it’s UTF-16 (Windows wchar_t) representation converted to a UTF-8 (char) representation via the
WideCharToMultiBytefunction will always yield the same byte sequence.