Just curious about the encodings that system is using when handling string storing(if it

Question

0

Asked: May 17, 20262026-05-17T23:11:52+00:00 2026-05-17T23:11:52+00:00

Just curious about the encodings that system is using when handling string storing(if it

0

Just curious about the encodings that system is using when handling string storing(if it cares) and printing.

Question 1: If I store one-byte string in std::string or two-byte string in std::wstring, will the underlying integer value differ depending on the encoding currently in use? (I remember that Bjarne says that encoding is the mapping between char and integer(s) so char should be stored as integer(s) in memory, and different encodings don’t necessarily have the same mapping)

Question 2: If positive, std::string and std::wstring must have the knowledge of the encoding themselves(although another guy told me this is NOT true)? Otherwise, how is it able to translate the char to correct integers and store them? How does the system know the encoding?

Question 3: What is the default encoding in one particular system, and how to change it(Is it so-called “locale”)? I guess the same mechanism matters?

Question 4: What if I print a string to the screen with std::cout, is it the same encoding?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T23:11:52+00:00

(I remember that Bjarne says that
encoding is the mapping between char
and integer(s) so char should be
stored as integer(s) in memory)

Not quite. Make sure you understand one important distinction.

A character is the minimum unit of text. A letter, digit, punctuation mark, symbol, space, etc.
A byte is the minimum unit of memory. On the overwhelming majority of computers, this is 8 bits.

Encoding is converting a sequence of characters to a sequence of bytes. Decoding is converting a sequence of bytes to a sequence of characters.

The confusing thing for C and C++ programmers is that char means byte, NOT character! The name char for the byte type is a legacy from the pre-Unicode days when everyone (except East Asians) used single-byte encodings. But nowadays, we have Unicode, and its encoding schemes which have up to 4 bytes per character.

Question 1: If I store one-byte string
in std::string or two-byte string in
std::wstring, will the underlying
integer value depend on the encoding
currently in use?

Yes, it will. Suppose you have std::string euro = "€"; Then:

With the windows-1252 encoding, the string will be encoded as the byte 0x80.
With the ISO-8859-15 encoding, the string will be encoded as the byte 0xA4.
With the UTF-8 encoding, the string will be encoded as the three bytes 0xE2, 0x82, 0xAC.

Question 3: What is the default
encoding in one particular system, and
how to change it(Is it so-called
“locale”)?

Depends on the platform. On Unix, the encoding can be specified as part of the LANG environment variable.

~$ echo $LANG
en_US.utf8

Windows has a GetACP function to get the “ANSI” code page number.

Question 4: What if I print a string
to the screen with std::cout, is it
the same encoding?

Not necessarily. On Windows, the command line uses the “OEM” code page, which is usually different from the “ANSI” code page used elsewhere.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Just curious about the encodings that system is using when handling string storing(if it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply