I’ve got this code:
string test("żaba");
cout << "Word: " << test << endl;
cout << "Length: " << test.size() << endl;
cout << "Letter: " << test.at(0) << endl;
The output is strange:
Word: żaba
Length: 5
Letter: �
As you can see, length should be 4 and letter: “ż”.
How can I correct this code to work properly?
std::stringon non-Windows is usually used to store UTF8 strings (being the default encoding on most sane operating systems this side of 2010), but it is a “dumb” container that in the sense that it doesn’t know or care anything about the bytes you’re storing. It’ll work for reading, storing, and writing; but not for string manipulation.You need to use the excellent and well-maintained IBM ICU: International Components for Unicode. It’s a C/C++ library for *nix or Windows into which a ton of research has gone to provide a culture-aware string library, including case-insensitive string comparison that’s both fast and accurate.
Another good project that’s easier to switch to for C++ devs is UTF8-CPP