I am wringing a class method that will convert a UTF8 character into its

Question

0

Asked: June 11, 20262026-06-11T12:25:55+00:00 2026-06-11T12:25:55+00:00

I am wringing a class method that will convert a UTF8 character into its

0

I am wringing a class method that will convert a UTF8 character into its representative Unicode code point. My prototype candidates are the ones below:

static uint32_t Utf8ToWStr( uint8_t Byte1,        uint8_t Byte2 = 0x00,
                            uint8_t Byte3 = 0x00, uint8_t Byte4 = 0x00,
                            uint8_t Byte5 = 0x00, uint8_t Byte6 = 0x00);

static uint32_t Utf8ToWStr(const std::vector<uint8_t> & Bytes);

In my applications;
Byte1 will be the only non-zero byte approximately 90% of the time.
Byte1 and Byte2 will be the only non-zero bytes approximately 9% of the time.
Byte1, Byte2 and Byte3 will be the only non-zero byte less than 1% of the time.
Byte4, Byte5 and Byte6 will almost always be zero.

Which prototype should I prefer for speed?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T12:25:56+00:00

Probably neither.

Think of the code calling this function — they will likely have to jump through massive hoops to use it:

uint8_t c1 = *cursor++;
uint8_t c2 = 0;
uint8_t c3 = 0;
uint8_t c4 = 0;
uint8_t c5 = 0;
uint8_t c6 = 0;
if(c1 >= 0x80)
    c2 = *cursor++;
if(c1 >= 0xc0)
    c3 = *cursor++;
if(c1 >= 0xe0)
    c4 = *cursor++;
if(c1 >= 0xf0)
    c5 = *cursor++;
if(c1 >= 0xf8)
    c6 = *cursor++;
uint32_t wch = Utf8ToWStr(c1, c2, c3, c4, c5, c6);

I sincerely doubt this interface is useful.

My normal interface for conversion routines is

bool utf8_to_wchar(uint8_t const *&cursor, uint8_t const *end, uint32_t &result);

The return value is used to convey errors (for example, how would your function react to the parameters (0x81, 0x00)?

Last but not least, you might want to have a mode that specifies whether denormalized UTF-8 should give an error — from a security POV it is a good idea to disallow encoding U+003F as 0x80 0x3f.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am wringing a class method that will convert a UTF8 character into its

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply