In the Qt documentation it states that (among others) the following Unicode string encodings are supported:
- UTF-8
- UTF-16
- UTF-16BE
- UTF-16LE
- UTF-32
- UTF-32BE
- UTF-32LE
Due to the three different codecs listed for 2 and 4 octet encoded Unicode, I was wondering: how do the two non-endian codecs (“UTF-16” and “UTF-32”) decide which endianness to use?
Based on the source code in
src/corelibs/codecs/, it seems Qt uses the byte ordering of the host for UTF-16 and UTF-32.If you use
QTextCodecto read an existing Unicode string that has a BOM, and you didn’t explicitly ask to ignore the header, the byte ordering detected in the string is used.In *qutfcodec_p.h* both
QUtf16Codec::eandQUtf32Codec::eare initialized with the valueDetectEndianness(an enum).In qutfcodec.cpp, near the beginning of the functions
convertFromUnicodeandconvertToUnicodefrom the classesQUtf16andQUtf32(used byQUtf16CodecandQUtf32Codec), you can find the line: