Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 306169
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T07:26:32+00:00 2026-05-12T07:26:32+00:00

I have a multi-byte string containing a mixture of japanese and latin characters. I’m

  • 0

I have a multi-byte string containing a mixture of japanese and latin characters. I’m trying to copy parts of this string to a separate memory location. Since it’s a multi-byte string, some of the characters uses one byte and other characters uses two. When copying parts of the string, I must not copy “half” japanese characters. To be able to do this properly, I need to be able to determine where in the multi-byte string characters starts and ends.

As an example, if the string contains 3 characters which requires [2 byte][2 byte][1 byte], I must copy either 2, 4 or 5 bytes to the other location and not 3, since if I were copying 3 I would copy only half the second character.

To figure out where in the multi-byte string characters starts and ends, I’m trying to use the Windows API function CharNext and CharNextExA but without luck. When I use these functions, they navigate through my string one byte at a time, rather than one character at a time. According to MSDN, CharNext is supposed to The CharNext function retrieves a pointer to the next character in a string..

Here’s some code to illustrate this problem:

#include <windows.h>
#include <stdio.h>
#include <wchar.h>
#include <string.h>

/* string consisting of six "asian" characters */
wchar_t wcsString[] = L"\u9580\u961c\u9640\u963f\u963b\u9644";

int main() 
{
   // Convert the asian string from wide char to multi-byte.
   LPSTR mbString = new char[1000];
   WideCharToMultiByte( CP_UTF8, 0, wcsString, -1, mbString, 100,  NULL, NULL);

   // Count the number of characters in the string.
   int characterCount = 0;
   LPSTR currentCharacter = mbString;
   while (*currentCharacter)
   {
      characterCount++;

     currentCharacter = CharNextExA(CP_UTF8, currentCharacter, 0);
   }
}

(please ignore memory leak and failure to do error checking.)

Now, in the example above I would expect that characterCount becomes 6, since that’s the number of characters in the asian string. But instead, characterCount becomes 18 because mbString contains 18 characters:

門阜陀阿阻附

I don’t understand how it’s supposed to work. How is CharNext supposed to know whether “é–€é” in the string is an encoded version of a Japanese character, or in fact the characters é – € and é?

Some notes:

  • I’ve read Joels blog post about what every developer needs to know about Unicode. I may have misunderstood something in it though.
  • If all I wanted to do was to count the characters, I could count the characters in the asian string directly. Keep in mind that my real goal is copying parts of the multi-byte string to a separate location. The separate location only supports multi-byte, not widechar.
  • If I convert the content of mbString back to wide char using MultiByteToWideChar, I get the correct string (門阜陀阿阻附), which indicates that there’s nothing wrong with mbString.

EDIT:
Apparantly the CharNext functions doesn’t support UTF-8 but Microsoft forgot to document that. I threw/copiedpasted together my own routine, which I won’t use and which needs improving. I’m guessing it’s easily crashable.

  LPSTR CharMoveNext(LPSTR szString)
  {
     if (szString == 0 || *szString == 0)
        return 0;

     if ( (szString[0] & 0x80) == 0x00)
        return szString + 1;
     else if ( (szString[0] & 0xE0) == 0xC0)
        return szString + 2;
     else if ( (szString[0] & 0xF0) == 0xE0)
        return szString + 3;
     else if ( (szString[0] & 0xF8) == 0xF0)
        return szString + 4;
     else
        return szString +1;
  }
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T07:26:32+00:00Added an answer on May 12, 2026 at 7:26 am

    Here is a really good explanation of what is going on here at the Sorting it All Out blog: Is CharNextExA broken?. In short, CharNext is not designed to work with UTF8 strings.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a multi-step order form built in this manner: Step 1: Choose category
I have this hash: a={topic_id=>60693, urlkey=>innovacion, name=>Innovaci\xF3n} and I am trying to save it
While trying to run a string through PHP's htmlentities function, I have some cases
I have application which reads key in registry entry. Its working good on Multi-Byte
We have multi-part install that needs a reboot to continue. We added a RunOnce
I'm using 'rails3-jquery-autocomplete' gem, but it doesn't have multi column search, but there is
I have a multi-array stored in a SESSION I loop through the data in
I have a multi-line text view that can get quite large. When the user
I have a multi-xaml metro app. I want to switch between xaml by button
I have: Textbox(Multi-line) Panel Different controls inside panel (buttons,textbox) Scenario: I need to add

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.