Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8436093
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T07:05:44+00:00 2026-06-10T07:05:44+00:00

We are specifically eyeing Windows and Linux development, and have come up with two

  • 0

We are specifically eyeing Windows and Linux development, and have come up with two differing approaches that both seem to have their merits. The natural unicode string type in Windows is UTF-16, and UTF-8 in linux.

We can’t decide whether the best approach:

  1. Standardise on one of the two in all our application logic (and persistent data), and make the other platforms do the appropriate conversions

  2. Use the natural format for the OS for application logic (and thus making calls into the OS), and convert only at the point of IPC and persistence.

To me they seem like they are both about as good as each other.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T07:05:46+00:00Added an answer on June 10, 2026 at 7:05 am

    and UTF-8 in linux.

    It’s mostly true for modern Linux. Actually encoding depends on what API or library is used. Some hardcoded to use UTF-8. But some read LC_ALL, LC_CTYPE or LANG environment variables to detect encoding to use (like Qt library). So be careful.

    We can’t decide whether the best approach

    As usual it depends.

    If 90% of code is to deal with platform specific API in platform specific way, obviously it is better to use platform specific strings. As an example – a device driver or native iOS application.

    If 90% of code is complex business logic that is shared across platforms, obviously it is better to use same encoding on all platforms. As an example – chat client or browser.

    In second case you have a choice:

    • Use cross platform library that provides strings support (Qt, ICU, for example)
    • Use bare pointers (I consider std::string a “bare pointer” too)

    If working with strings is a significant part of your application, choosing a nice library for strings is a good move. For example Qt has a very solid set of classes that covers 99% of common tasks. Unfortunately, I has no ICU experience, but it also looks very nice.

    When using some library for strings you need to care about encoding only when working with external libraries, platform API or sending strings over the net (or disk). For example, a lot of Cocoa, C# or Qt (all has solid strings support) programmers know very little about encoding details (and it is good, since they can focus on their main task).

    My experience in working with strings is a little specific, so I personally prefer bare pointers. Code that use them is very portable (in sense it can be easily reused in other projects and platforms) because has less external dependencies. It is extremely simple and fast also (but one probably need some experience and Unicode background to feel that).

    I agree that bare pointers approach is not for everyone. It is good when:

    • You work with entire strings and splitting, searching, comparing is a rare task
    • You can use same encoding in all components and need a conversion only when using platform API
    • All your supported platforms has API to:
      • Convert from your encoding to that is used in API
      • Convert from API encoding to that is used in your code
    • Pointers is not a problem in your team

    From my a little specific experience it is actually a very common case.

    When working with bare pointers it is good to choose encoding that will be used in entire project (or in all projects).

    From my point of view, UTF-8 is an ultimate winner. If you can’t use UTF-8 – use strings library or platform API for strings – it will save you a lot of time.

    Advantages of UTF-8:

    • Fully ASCII compatible. Any ASCII string is a valid UTF-8 string.
    • C std library works great with UTF-8 strings. (*)
    • C++ std library works great with UTF-8 (std::string and friends). (*)
    • Legacy code works great with UTF-8.
    • Quite any platform supports UTF-8.
    • Debugging is MUCH easier with UTF-8 (since it is ASCII compatible).
    • No Little-Endian/Big-Endian mess.
    • You will not catch a classical bug “Oh, UTF-16 is not always 2 bytes?”.

    (*) Until you need to lexical compare them, transform case (toUpper/toLower), change normalization form or something like this – if you do – use strings library or platform API.

    Disadvantage is questionable:

    • Less compact for Chinese (and other symbols with large code point numbers) than UTF-16.
    • Harder (a little actually) to iterate over symbols.

    So, I recommend to use UTF-8 as common encoding for project(s) that doesn’t use any strings library.

    But encoding is not the only question you need to answer.

    There is such thing as normalization. To put it simple, some letters can be represented in several ways – like one glyph or like a combination of different glyphs. The common problem with this is that most of string compare functions treat them as different symbols. If you working on cross-platform project, choosing one of normalization forms as standard is a right move. This will save your time.

    For example if user password contains “йёжиг” it will be differently represented (in both UTF-8 and UTF-16) when entered on Mac (that mostly use Normalization Form D) and on Windows (that mostly likes Normalization Form C). So if user registered under Windows with such password it will a problem for him to login under Mac.

    In addition I would not recommend to use wchar_t (or use it only in windows code as a UCS-2/UTF-16 char type). The problem with wchar_t is that there is no encoding associated with it. It’s just an abstract wide char that is larger than normal char (16 bits on Windows, 32 bits on most *nix).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Specifically looking for 3rd party tools free/commercial that have reduced development time.
Specifically: I have two unsigned integers (a,b) and I want to calculate (a*b)%UINT_MAX (UINT_MAX
Specifically, I have an XSLT macro that modifies some content. I have this in
Specifically, if I use NInject to create a bunch of objects that have been
Specifically, Sql Server 2005/T-Sql. I have a field that is mostly a series of
Specifically I have a PHP command-line script that at a certain point requires input
Specifically, I have a model that has a field like this pub_date = models.DateField(date
Specifically, I have a folder structure that looks like the below: about (main folder)
Specifically if I have a server that I want to use my SQL Server
Specifically we have a SQL Server stored procedure that accepts a hierarchyId as a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.