The functions c32rtomb and mbrtoc32 from <cuchar> / <uchar.h> are described in the C

Question

0

Asked: June 13, 20262026-06-13T10:03:40+00:00 2026-06-13T10:03:40+00:00

The functions c32rtomb and mbrtoc32 from <cuchar> / <uchar.h> are described in the C

0

The functions c32rtomb and mbrtoc32 from <cuchar>/<uchar.h> are described in the C Unicode TR (draft) as performing conversions between UTF-32¹ and “multibyte characters”.

(…) If s is not a null
pointer, the c32rtomb function determines the number of bytes needed to represent
the multibyte character that corresponds to the wide character given by c32
(including any shift sequences), and stores the multibyte character representation in
the array whose first element is pointed to by s. (…)

What is this “multibyte character representation”? I’m actually interested in the behaviour of the following program:

#include <cassert>
#include <cuchar>
#include <string>

int main() {
    std::u32string u32 = U"this is a wide string";
    std::string narrow  = "this is a wide string";
    std::string converted(1000, '\0');
    char* ptr = &converted[0];
    std::mbstate_t state {};
    for(auto u : u32) {
        ptr += std::c32rtomb(ptr, u, &state);
    }
    converted.resize(ptr - &converted[0]);
    assert(converted == narrow);
}

Is the assertion in it guaranteed to hold¹?

¹ Working under the assumption that __STDC_UTF_32__ is defined.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T10:03:42+00:00

For the assertion to be guaranteed to hold true it’s necessary that the multibyte encoding used by c32rtomb() be the same as the encoding used for string literals, at least as far as the characters actually used in the string.

C99 7.11.1.1/2 specifies that setlocale() with the category LC_CTYPE affects the behavior of the character handling functions and the multibyte and wide character functions. I don’t see any explicit acknowledgement that the effect is to set the multibyte and wide character encodings used, however that is the intent.

So the multibyte encoding used by c32rtomb() is the multibyte encoding from the default “C” locale.

C++11 2.14.3/2 specifies that the execution encoding, wide execution encoding, UTF-16, and UTF-32 are used for the corresponding character and string literals. Therefore std::string narrow uses the execution encoding to represent that string.

So is the “C” locale encoding of this string the same as the execution encoding of this string?

C99 7.11.1.1/3 specifies that the “C” locale provides “the minimal environment” for C translation. Such an environment would include not only character sets, but also the specific character codes used. So I believe this means not only that the “C” locale must support the characters required in translation (i.e., the basic character set), but additionally that those characters in the “C” locale must use the same character codes.

All of the characters in your string literals are members of the basic character set, and therefore converting the char32_t representation to the char “C” locale representation must produce the same sequence of values as the compiler produces for the char string literal; the assertion must hold true.

I don’t see any suggestion that anything beyond the basic character set is supported in a compatible way between the execution encoding and the “C” locale, so if your string literal used any characters outside the basic character set then there would not be any guarantee that the assertion would hold. Even stipulating extended characters that exist in both the execution character set and the “C” locale, I don’t see any requirement that the representations match each other.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The functions c32rtomb and mbrtoc32 from <cuchar> / <uchar.h> are described in the C

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply