Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4099844
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T20:29:21+00:00 2026-05-20T20:29:21+00:00

Arrrgh. Does anyone know how to create a function that’s the multibyte character equivalent

  • 0

Arrrgh. Does anyone know how to create a function that’s the multibyte character equivalent of the PHP count_chars($string, 3) command?

Such that it will return a list of ONLY ONE INSTANCE of each unique character. If that was English and we had

“aaabggxxyxzxxgggghq xcccxxxzxxyx”

It would return “abgh qxyz” (Note the space IS counted).

(The order isn’t important in this case, can be anything).

If Japanese kanji (not sure browsers will all support this):

漢漢漢字漢字私私字私字漢字私漢字漢字私

And it will return just the 3 kanji used:

漢字私

It needs to work on any UTF-8 encoded string.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T20:29:22+00:00Added an answer on May 20, 2026 at 8:29 pm

    Hey Dave, you’re never going to see this one coming.

    php > $kanji = '漢漢漢字漢字私私字私字漢字私漢字漢字私';
    php > $not_kanji = 'aaabcccbbc';
    php > $pattern = '/(.)\1+/u';
    php > echo preg_replace($pattern, '$1', $kanji);
    漢字漢字私字私字漢字私漢字漢字私
    php > echo preg_replace($pattern, '$1', $not_kanji);
    abcbc
    

    What, you thought I was going to use mb_substr again?

    In regex-speak, it’s looking for any one character, then one or more instances of that same character. The matched region is then replaced with the one character that matched.

    The u modifier turns on UTF-8 mode in PCRE, in which it deals with UTF-8 sequences instead of 8-bit characters. As long as the string being processed is UTF-8 already and PCRE was compiled with Unicode support, this should work fine for you.


    Hey, guess what!

    $not_kanji = 'aaabbbbcdddbbbbccgggcdddeeedddaaaffff';
    $l = mb_strlen($not_kanji);
    $unique = array();
    for($i = 0; $i < $l; $i++) {
        $char = mb_substr($not_kanji, $i, 1);
        if(!array_key_exists($char, $unique))
            $unique[$char] = 0;
        $unique[$char]++;
    }
    echo join('', array_keys($unique));
    

    This uses the same general trick as the shuffle code. We grab the length of the string, then use mb_substr to extract it one character at a time. We then use that character as a key in an array. We’re taking advantage of PHP’s positional arrays: keys are sorted in the order that they are defined. Once we’ve gone through the string and identified all of the characters, we grab the keys and join’em back together in the same order that they appeared in the string. You also get a per-character character count from this technique.

    This would have been much easier if there was such a thing as mb_str_split to go along with str_split.

    (No Kanji example here, I’m experiencing a copy/paste bug.)


    Here, try this on for size:

    function mb_count_chars_kinda($input) {
        $l = mb_strlen($input);
        $unique = array();
        for($i = 0; $i < $l; $i++) {
            $char = mb_substr($input, $i, 1);
            if(!array_key_exists($char, $unique))
                $unique[$char] = 0;
            $unique[$char]++;
        }
        return $unique;
    }
    
    function mb_string_chars_diff($one, $two) {
        $left = array_keys(mb_count_chars_kinda($one));
        $right = array_keys(mb_count_chars_kinda($two));
        return array_diff($left, $right);
    }
    
    print_r(mb_string_chars_diff('aabbccddeeffgg', 'abcde'));
    /* => 
    Array
    (
        [5] => f
        [6] => g
    )
    */
    

    You’ll want to call this twice, the second time with the left string on the right, and the right string on the left. The output will be different — array_diff just gives you the stuff in the left side that’s missing from the right, so you have to do it twice to get the whole story.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Arrggh. I've seen like 15 examples that say do it like this: SharePoint web
$(.feature).change(function(){ getProductSelections(); }); ARRRGHH!!
I can't seem to read a .csv file using the following connection string: var
I'm constructing a fluent interface where I have a base class that contains the
I have FCKEditor (2.6.4) working on my development machine with the aspx connector for
When I use setSelectedComponent or setSelectedIndex on a JTabbedPane object, the panel always comes
Arrrg!I am running into what i feel is a dumb issue with a simple
I cannot get std::tr1::shared_ptr for my WinMobile project since the STL for WinCE is
I'm a low-level algorithm programmer, and databases are not really my thing - so
So I'm having a head against the wall moment and hoping somebody can come

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.