Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8543945
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T12:25:55+00:00 2026-06-11T12:25:55+00:00

I am wringing a class method that will convert a UTF8 character into its

  • 0

I am wringing a class method that will convert a UTF8 character into its representative Unicode code point. My prototype candidates are the ones below:

static uint32_t Utf8ToWStr( uint8_t Byte1,        uint8_t Byte2 = 0x00,
                            uint8_t Byte3 = 0x00, uint8_t Byte4 = 0x00,
                            uint8_t Byte5 = 0x00, uint8_t Byte6 = 0x00);

static uint32_t Utf8ToWStr(const std::vector<uint8_t> & Bytes);

In my applications;
Byte1 will be the only non-zero byte approximately 90% of the time.
Byte1 and Byte2 will be the only non-zero bytes approximately 9% of the time.
Byte1, Byte2 and Byte3 will be the only non-zero byte less than 1% of the time.
Byte4, Byte5 and Byte6 will almost always be zero.

Which prototype should I prefer for speed?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T12:25:56+00:00Added an answer on June 11, 2026 at 12:25 pm

    Probably neither.

    Think of the code calling this function — they will likely have to jump through massive hoops to use it:

    uint8_t c1 = *cursor++;
    uint8_t c2 = 0;
    uint8_t c3 = 0;
    uint8_t c4 = 0;
    uint8_t c5 = 0;
    uint8_t c6 = 0;
    if(c1 >= 0x80)
        c2 = *cursor++;
    if(c1 >= 0xc0)
        c3 = *cursor++;
    if(c1 >= 0xe0)
        c4 = *cursor++;
    if(c1 >= 0xf0)
        c5 = *cursor++;
    if(c1 >= 0xf8)
        c6 = *cursor++;
    uint32_t wch = Utf8ToWStr(c1, c2, c3, c4, c5, c6);
    

    I sincerely doubt this interface is useful.

    My normal interface for conversion routines is

    bool utf8_to_wchar(uint8_t const *&cursor, uint8_t const *end, uint32_t &result);
    

    The return value is used to convey errors (for example, how would your function react to the parameters (0x81, 0x00)?

    Last but not least, you might want to have a mode that specifies whether denormalized UTF-8 should give an error — from a security POV it is a good idea to disallow encoding U+003F as 0x80 0x3f.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a simple Screen class in C# that has a bunch of events
I'm trying to embed a PDF file into a Word document using the OLE
UPDATE: It was suggested in the comments that I create a wiki for this.
I am trying to start up an Intent from within a class which implements
I am wrinting a shell script and have a variable like this: something-that-is-hyphenated .
Several online searches give me the impression that very few people like to write
I'm wriging a wrapper for C++ of a function declared in this way: class
I'm stuck wringing to change the view by a segue after the user presses
We have a pretty cool little web framework that we have used successfully on
So we have recently started developing applications for the iPad for our company. Unfortunately

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.