Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7566381
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T14:23:50+00:00 2026-05-30T14:23:50+00:00

I’ve noticed the length method of std::string returns the length in bytes and the

  • 0

I’ve noticed the length method of std::string returns the length in bytes and the same method in std::u16string returns the number of 2-byte sequences.

I’ve also noticed that when a character or code point is outside of the BMP, length returns 4 rather than 2.

Furthermore, the Unicode escape sequence is limited to \unnnn, so any code point above U+FFFF cannot be inserted by an escape sequence.

In other words, there doesn’t appear to be support for surrogate pairs or code points outside of the BMP.

Given this, is the accepted or recommended practice to use a non-standard string manipulation library that understands UTF-8, UTF-16, surrogate pairs, and so on?

Does my complier have a bug or am I using the standard string manipulation methods incorrectly?

Example:

/*
* Example with the Unicode code points U+0041, U+4061, U+10196 and U+10197
*/

#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    std::string example1 = u8"A䁡";
    std::u16string example2 = u"A䁡";

    std::cout << "Escape Example: " << "\u0041\u4061\u10196\u10197" << "\n";
    std::cout << "Example: " << example1 << "\n";
    std::cout << "std::string Example length: " << example1.length() << "\n";
    std::cout << "std::u16string Example length: " << example2.length() << "\n";

    return 0;
}

Here is the result I get when compiled with GCC 4.7:

Escape Example: A䁡မ6မ7
Example: A䁡
std::string Example length: 12
std::u16string Example length: 6
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T14:23:52+00:00Added an answer on May 30, 2026 at 2:23 pm

    At the risk of judging prematurely, it seems to me that the language used in the standards in slightly ambiguous (although the final conclusion is clear, see at the end):

    In the description of char16_t literals (i.e. the u"..." ones like in your example), the size of a literal is defined as:

    The size of a char16_t string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for each character requiring a surrogate pair, plus one for the terminating u’\0’.

    And the footnote further clarifies:

    [ Note: The size of a char16_t string literal is the number of code units, not the number of characters. —end note ]

    This implies a definition of character and code unit. A surrogate pair is one
    character
    , but two code units.

    However, the description of the length() method of std::basic_string (of which std::u16string is derived) claims:

    Returns the number of characters in the string, i.e. std::distance(begin(), end()). It is the same as size().

    As it appears, the description of length() uses the word character to mean what the definition of char16_t calls a code unit.

    However, the conclusion of all of this is: The length is defined as code units, hence your compiler complies with the standard, and there will be continued demand for special libraries to provide proper counting of characters.

    I used the references below:

    • For the definition of the size of char16_t literals: Here
    • For the description of std::basic_string::length(): Here
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to count the length of a string with PHP. The string
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I've got a string that has curly quotes in it. I'd like to replace
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I want to count how many characters a certain string has in PHP, but
For some reason, after submitting a string like this Jack’s Spindle from a text
Specifically, suppose I start with the string string =hello \'i am \' me And
I have a French site that I want to parse, but am running into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.