Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6900937
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T07:36:38+00:00 2026-05-27T07:36:38+00:00

In my application I have to constantly convert string between std::string and std::wstring due

  • 0

In my application I have to constantly convert string between std::string and std::wstring due different APIs (boost, win32, ffmpeg etc..). Especially with ffmpeg the strings end up utf8->utf16->utf8->utf16, just to open a file.

Since UTF8 is backwards compatible with ASCII I thought that I consistently store all my strings UTF-8 std::string and only convert to std::wstring when I have to call certain unusual functions.

This worked kind of well, I implemented to_lower, to_upper, iequals for utf8. However then I met several dead-ends std::regex, and regular string comparisons. To make this usable I would need to implement a custom ustring class based on std::string with re-implementation of all corresponding algorithms (including regex).

Basically my conclusion is that utf8 is not very good for general usage. And the current std::string/std::wstring is mess.

However, my question is why the default std::string and "" are not simply changed to use UTF8? Especially as UTF8 is backward compatible? Is there possibly some compiler flag which can do this? Of course the stl implemention would need to be automatically adapted.

I’ve looked at ICU, but it is not very compatible with apis assuming basic_string, e.g. no begin/end/c_str etc…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T07:36:39+00:00Added an answer on May 27, 2026 at 7:36 am

    The main issue is the conflation of in-memory representation and encoding.

    None of the Unicode encoding is really amenable to text processing. Users will in general care about graphemes (what’s on the screen) while the encoding is defined in terms of code points… and some graphemes are composed of several code points.

    As such, when one asks: what is the 5th character of "Hélène" (French first name) the question is quite confusing:

    • In terms of graphemes, the answer is n.
    • In terms of code points… it depends on the representation of é and è (they can be represented either as a single code point or as a pair using diacritics…)

    Depending on the source of the question (a end-user in front of her screen or an encoding routine) the response is completely different.

    Therefore, I think that the real question is Why are we speaking about encodings here?

    Today it does not make sense, and we would need two “views”: Graphemes and Code Points.

    Unfortunately the std::string and std::wstring interfaces were inherited from a time where people thought that ASCII was sufficient, and the progress made didn’t really solve the issue.

    I don’t even understand why the in-memory representation should be specified, it is an implementation detail. All a user should want is:

    • to be able to read/write in UTF-* and ASCII
    • to be able to work on graphemes
    • to be able to edit a grapheme (to manage the diacritics)

    … who cares how it is represented? I thought that good software was built on encapsulation?

    Well, C cares, and we want interoperability… so I guess it will be fixed when C is.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In my application i am constantly moving from one control to another. I have
I have application that makes different queries with different results so the caching in
I have Application settings stored under HKEY_LOCAL_MACHINE\SOFTWARE\MyCompany branch. Settings must be same for different
I have an application that needs to constantly (every 50ms), call to an MVC
I have a windows application which should constantly keep track of the active application
I have a php application getting x and y position constantly from the server.
I have a Unicode Win32 application that uses 3rd party libraries, some of which
I have an application that is running as a background service and constantly listens
I have audio constantly playing during my application. I am having some odd behavior
Does anybody know of a way to have my application constantly check firstly if

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.