Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4060502
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T15:20:45+00:00 2026-05-20T15:20:45+00:00

I would like to covert a QString into either a utf8 or a latin1

  • 0

I would like to covert a QString into either a utf8 or a latin1 QByteArray,
but today I get everything as utf8.

And I am testing this with some char in the higher segment of latin1 higher than 0x7f,
where the german ü is a good example.

If I do like this:

QString name("\u00fc"); // U+00FC = ü
QByteArray utf8;
utf8.append(name);
qDebug() << "utf8" << name << utf8.toHex();

QByteArray latin1;
latin1.append(name.toLatin1());
qDebug() << "Latin1" << name << latin1.toHex();

QTextCodec *codec = QTextCodec::codecForName("ISO 8859-1");
QByteArray encodedString = codec->fromUnicode(name);
qDebug() << "ISO 8859-1" << name << encodedString.toHex();

I get the following output.

utf8 "ü" "c3bc" 
Latin1 "ü" "c3bc" 
ISO 8859-1 "ü" "c3bc" 

As you can see I get the unicode 0xc3bc everywhere, where I would expect to get the Latin1 0xfc for step 2 and 3.

My guess is that I should get something like this:

utf8 "ü" "c3bc" 
Latin1 "ü" "fc" 
ISO 8859-1 "ü" "fc" 

What is going on here?

/Thanks


Links to some character tables:

  • http://www.utoronto.ca/web/HTMLdocs/NewHTML/iso_table.html
  • http://www.utf8-zeichentabelle.de/

This code was build and executed on a Ubuntu 10.04 based system.

$> uname -a
Linux frog 2.6.32-28-generic-pae #55-Ubuntu SMP Mon Jan 10 22:34:08 UTC 2011 i686 GNU/Linux
$> env | grep LANG
LANG=en_US.utf8

And if I try to use

utf8.append(name.toUtf8());

I get this output

utf8 "ü" "c383c2bc" 
Latin1 "ü" "c3bc" 
ISO 8859-1 "ü" "c3bc" 

So the latin1 is unicode and the utf8 is double encoded…

This must depend on some system settings?


If I run this (could not get the .name() to build)

qDebug() << "system name:"      << QLocale::system().name();
qDebug() << "codecForCStrings:" << QTextCodec::codecForCStrings();
qDebug() << "codecForLocale:"   << QTextCodec::codecForLocale()->name();

Then I get this:

system name: "en_US" 
codecForCStrings: 0x0 
codecForLocale: "System" 

Solution

If I specify that it is UTF-8 I am using so the different classes know about this,
then it works.

QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));
QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));

qDebug() << "system name:"      << QLocale::system().name();
qDebug() << "codecForCStrings:" << QTextCodec::codecForCStrings()->name();
qDebug() << "codecForLocale:"   << QTextCodec::codecForLocale()->name();

QString name("\u00fc"); 
QByteArray utf8;
utf8.append(name);
qDebug() << "utf8" << name << utf8.toHex();

QByteArray latin1;
latin1.append(name.toLatin1());
qDebug() << "Latin1" << name << latin1.toHex();

QTextCodec *codec = QTextCodec::codecForName("latin1");
QByteArray encodedString = codec->fromUnicode(name);
qDebug() << "ISO 8859-1" << name << encodedString.toHex();

Then I get this output:

system name: "en_US" 
codecForCStrings: "UTF-8" 
codecForLocale: "UTF-8" 
utf8 "ü" "c3bc" 
Latin1 "ü" "fc" 
ISO 8859-1 "ü" "fc" 

And that looks like it should.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T15:20:46+00:00Added an answer on May 20, 2026 at 3:20 pm

    Things to know:

    • execution character page

    There’s something called execution character set in the C++ standard which is the term that describes what the output of string and character literals will be in the binary produced by compiler. You can read about it in the 1.1 Character sets subsection of 1 Overview section in The C Preprocessor’s Manual on http://gcc.gnu.org site.

    Question:
    What will be produced as a result of "\u00fc" string literal?

    Answer:
    It depends on what the execution character set is. In case of gcc (which is what you’re using) it’s by default UTF-8 unless you specify something different with -fexec-charset option. You can read about this and other options controlling preprocessing phase in the 3.11 Options Controlling the Preprocessor subsection of 3 GCC Command Options section in GCC’s Manual on http://gcc.gnu.org site. Now when we know that execution character set is UTF-8 we know that "\u00fc" will be translated to UTF-8 encoding of U+00FC Unicode’s code point which is a sequence of two bytes 0xc3 0xbc.

    • QString::QString ( const char * str ) and QByteArray & QByteArray::append ( const QString & str ) depend on global state

    The QString’s constructor taking char * calls QString QString::fromAscii ( const char * str, int size = -1 ) which uses codec set with void QTextCodec::setCodecForCStrings ( QTextCodec * codec ) (if any codec had been set) or does the same thing as QString QString::fromLatin1 ( const char * str, int size = -1 ) (in case no codec had been set).

    Question:
    What codec will be used by QString’s constructor to decode two byte sequence (0xc3 0xbc) it gets?

    Answer:
    By default no codec is set with QTextCodec::setCodecForCStrings() that’s why Latin1 will be used to decode byte sequence. As 0xc3 and 0xbc are both valid in Latin 1, representing respectively à and ¼ (this should already be familiar to you as it was taken directly from this answer to your earlier question) we get QString with these two characters.

    • qDebug() is not 8-bit clean

    You shouldn’t use QDebug class to output anything outside of ASCII. You have no guarantee what you get.

    Test program:

    #include <QtCore>
    
    void dbg(char const * rawInput, QString s) {
    
        QString codepoints;
        foreach(QChar chr, s) {
            codepoints.append(QString::number(chr.unicode(), 16)).append(" ");
        }
    
        qDebug() << "Input: " << rawInput
                 << ", "
                 << "Unicode codepoints: " << codepoints;
    }
    
    int main(int argc, char *argv[])
    {
        QCoreApplication app(argc, argv);
    
        qDebug() << "system name:"
                 << QLocale::system().name();
    
        for (int i = 1; i <= 5; ++i) {
    
            switch(i) {
    
            case 1:
                qDebug() << "\nWithout codecForCStrings (default is Latin1)\n";
                break;
            case 2:
                qDebug() << "\nWith codecForCStrings set to UTF-8\n";
                QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));
                break;
            case 3:
                qDebug() << "\nWithout codecForCStrings (default is Latin1), with codecForLocale set to UTF-8\n";
                QTextCodec::setCodecForCStrings(0);
                QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));
                break;
            case 4:
                qDebug() << "\nWithout codecForCStrings (default is Latin1), with codecForLocale set to Latin1\n";
                QTextCodec::setCodecForCStrings(0);
                QTextCodec::setCodecForLocale(QTextCodec::codecForName("Latin1"));
                break;
            }
    
            qDebug() << "codecForCStrings:" << (QTextCodec::codecForCStrings()
                                               ? QTextCodec::codecForCStrings()->name()
                                               : "NOT SET");
            qDebug() << "codecForLocale:"   << (QTextCodec::codecForLocale()
                                               ? QTextCodec::codecForLocale()->name()
                                               : "NOT SET");
    
            qDebug() << "\n1. Using QString::QString(char const *)";
            dbg("\\u00fc", QString("\u00fc"));
            dbg("\\xc3\\xbc", QString("\xc3\xbc"));
            dbg("LATIN SMALL LETTER U WITH DIAERESIS", QString("ü"));
    
            qDebug() << "\n2. Using QString::fromUtf8(char const *)";
            dbg("\\u00fc", QString::fromUtf8("\u00fc"));
            dbg("\\xc3\\xbc", QString::fromUtf8("\xc3\xbc"));
            dbg("LATIN SMALL LETTER U WITH DIAERESIS", QString::fromUtf8("ü"));
    
            qDebug() << "\n3. Using QString::fromLocal8Bit(char const *)";
            dbg("\\u00fc", QString::fromLocal8Bit("\u00fc"));
            dbg("\\xc3\\xbc", QString::fromLocal8Bit("\xc3\xbc"));
            dbg("LATIN SMALL LETTER U WITH DIAERESIS", QString::fromLocal8Bit("ü"));
        }
    
        return app.exec();
    }
    

    Output on mingw 4.4.0 on Windows XP:

    system name: "pl_PL"
    
    Without codecForCStrings (default is Latin1)
    
    codecForCStrings: "NOT SET"
    codecForLocale: "System"
    
    1. Using QString::QString(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "c3 bc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "c3 bc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    
    2. Using QString::fromUtf8(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "fc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "fc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fffd "
    
    3. Using QString::fromLocal8Bit(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "102 13d "
    Input:  \xc3\xbc ,  Unicode codepoints:  "102 13d "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    
    With codecForCStrings set to UTF-8
    
    codecForCStrings: "UTF-8"
    codecForLocale: "System"
    
    1. Using QString::QString(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "fc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "fc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fffd "
    
    2. Using QString::fromUtf8(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "fc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "fc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fffd "
    
    3. Using QString::fromLocal8Bit(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "102 13d "
    Input:  \xc3\xbc ,  Unicode codepoints:  "102 13d "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    
    Without codecForCStrings (default is Latin1), with codecForLocale set to UTF-8
    
    codecForCStrings: "NOT SET"
    codecForLocale: "UTF-8"
    
    1. Using QString::QString(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "c3 bc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "c3 bc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    
    2. Using QString::fromUtf8(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "fc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "fc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fffd "
    
    3. Using QString::fromLocal8Bit(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "fc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "fc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fffd "
    
    Without codecForCStrings (default is Latin1), with codecForLocale set to Latin1
    
    codecForCStrings: "NOT SET"
    codecForLocale: "ISO-8859-1"
    
    1. Using QString::QString(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "c3 bc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "c3 bc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    
    2. Using QString::fromUtf8(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "fc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "fc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fffd "
    
    3. Using QString::fromLocal8Bit(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "c3 bc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "c3 bc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    codecForCStrings: "NOT SET"
    codecForLocale: "ISO-8859-1"
    
    1. Using QString::QString(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "c3 bc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "c3 bc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    
    2. Using QString::fromUtf8(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "fc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "fc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fffd "
    
    3. Using QString::fromLocal8Bit(char const *)
    Input:  \u00fc ,  Unicode codepoints:  "c3 bc "
    Input:  \xc3\xbc ,  Unicode codepoints:  "c3 bc "
    Input:  LATIN SMALL LETTER U WITH DIAERESIS ,  Unicode codepoints:  "fc "
    

    I’d like to thank thiago, cbreak, peppe and heinz from #qt freenode.org IRC channel for showing and helping me to understand issues involved here.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to convert an absolute path into a relative path. This is
I would like to convert a string into a node. I have a method
I have an ActiveRecord model that I would like to convert to xml, but
I have a List<List<int>> . I would like to convert it into a List<int>
I would like to programatically convert a Microsoft Word document into XHTML. The language
I would like to take a pascal-cased string like CountOfWidgets and convert it into
I would like to convert a python variable name into the string equivalent as
I would like to convert addresses into long/lat. Is there any way to do
I would like to get a sum from a column, with and without a
I would like to convert this string {id:1,name:Test1},{id:2,name:Test2} to array of 2 JSON objects.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.