Just curious about the encodings that system is using when handling string storing(if it cares) and printing.
Question 1: If I store one-byte string in std::string or two-byte string in std::wstring, will the underlying integer value differ depending on the encoding currently in use? (I remember that Bjarne says that encoding is the mapping between char and integer(s) so char should be stored as integer(s) in memory, and different encodings don’t necessarily have the same mapping)
Question 2: If positive, std::string and std::wstring must have the knowledge of the encoding themselves(although another guy told me this is NOT true)? Otherwise, how is it able to translate the char to correct integers and store them? How does the system know the encoding?
Question 3: What is the default encoding in one particular system, and how to change it(Is it so-called “locale”)? I guess the same mechanism matters?
Question 4: What if I print a string to the screen with std::cout, is it the same encoding?
Not quite. Make sure you understand one important distinction.
Encoding is converting a sequence of characters to a sequence of bytes. Decoding is converting a sequence of bytes to a sequence of characters.
The confusing thing for C and C++ programmers is that
charmeans byte, NOT character! The namecharfor the byte type is a legacy from the pre-Unicode days when everyone (except East Asians) used single-byte encodings. But nowadays, we have Unicode, and its encoding schemes which have up to 4 bytes per character.Yes, it will. Suppose you have
std::string euro = "€";Then:Depends on the platform. On Unix, the encoding can be specified as part of the
LANGenvironment variable.Windows has a
GetACPfunction to get the “ANSI” code page number.Not necessarily. On Windows, the command line uses the “OEM” code page, which is usually different from the “ANSI” code page used elsewhere.