Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 954151
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T00:10:00+00:00 2026-05-16T00:10:00+00:00

there are already a few questions relating to this problem. I think my question

  • 0

there are already a few questions relating to this problem. I think my question is a bit different because I don’t have an actual problem, I’m only asking out of academic interest. I know that Windows’s implementation of UTF-16 is sometimes contradictory to the Unicode standard (e.g. collation) or closer to the old UCS-2 than to UTF-16, but I’ll keep the “UTF-16” terminology here for reasons of simplicity.

Background: In Windows, everything is UTF-16. Regardless of whether you’re dealing with the kernel, the graphics subsystem, the filesystem or whatever, you’re passing UTF-16 strings. There are no locales or charsets in the Unix sense. For compatibility with medieval versions of Windows, there is a thing called “codepages” that is obsolete but nonetheless supported. AFAIK, there is only one correct and non-obsolete function to write strings to the console, namely WriteConsoleW, which takes an UTF-16 string. Also, a similar discussion applies to input streams, which I’ll ignore, too.

However, I think this represents a design flaw in the Windows API: there is a generic function that can be used to write to all stream objects (files, pipes, consoles…) called WriteFile, but this function is byte-oriented and doesn’t accept UTF-16 strings. The documentation suggests using WriteConsoleW for console output, which is text oriented, and WriteFile for everything else, which is byte oriented. Since both console streams and file objects are represented by kernel object handles and console streams can be redirected, you have to call a function for every write to a standard output stream that checks whether the handle represents a console stream or a file, breaking polymorphy. OTOH, I do think that Windows’s separation between text strings and raw bytes (which is mirrored in many other systems like Java or Python) is conceptually superior to Unix’s char* approach that ignores encodings and doesn’t distinguish between strings and byte arrays.

So my questions are: What to do in this situation? And why isn’t this problem solved even in Microsoft’s own libraries? Both the .NET Framework and the C and C++ libraries seem to adhere to the obsolete codepage model. How would you design the Windows API or an application framework to circumvent this issue?

I think that the general problem (which is not easy to solve) is that all libraries assume that all streams are byte-oriented, and implement text-oriented streams on top of that. However, we see that Windows does have special text-oriented streams on the OS level, and the libraries are unable to deal with this. So in any case we must introduce significant changes to all standard libraries. A quick and dirty way would be to treat the console as a special byte-oriented stream that accepts only one encoding. This still requires that the C and C++ standard libraries must be circumvented because they don’t implement the WriteFile/WriteConsoleW switch. Is that correct?

  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T00:10:01+00:00Added an answer on May 16, 2026 at 12:10 am

    The general strategy I/we use in most (cross platform) applications/projects is: We just use UTF-8 (I mean the real standard) everywhere. We use std::string as the container and we just interpret everything as UTF8. And we also handle all file IO this way, i.e. we expect UTF8 and save UTF8. In the case when we get a string from somewhere and we know that it is not UTF8, we will convert it to UTF8.

    The most common case where we stumble upon WinUTF16 is for filenames. So for every filename handling, we will always convert the UTF8 string to WinUTF16. And also the other way if we search through a directory for files.

    The console isn’t really used in our Windows build (in the Windows build, all console output is wrapped into a file). As we have UTF8 everywhere, also our console output is UTF8 which is fine for most modern systems. And also the Windows console log file has its content in UTF8 and most text-editors on Windows can read that without problems.

    If we would use the WinConsole more and if we would care a lot that all special chars are displayed correctly, we maybe would write some automatic pipe handler which we install in between fileno=0 and the real stdout which will use WriteConsoleW as you have suggested (if there is really no easier way).

    If you wonder about how to realize such automatic pipe handler: We have implemented such thing already for all POSIX-like systems. The code probably doesn’t work on Windows as it is but I think it should be possible to port it. Our current pipe handler is similar to what tee does. I.e. if you do a cout << "Hello" << endl, it will both be printed on stdout and in some log-file. Look at the code if you are interested how this is done.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.