Currently, I am developing an app for a China customer. China customer are mostly

Question

0

Asked: May 14, 20262026-05-14T00:44:58+00:00 2026-05-14T00:44:58+00:00

Currently, I am developing an app for a China customer. China customer are mostly

0

Currently, I am developing an app for a China customer. China customer are mostly switch to GB2312 language in their OS encoding. I need to write a text file, which will be encoded using GB2312.

I use std::ofstream file
I compile my application under MBCS mode, not unicode.
I use the following code, to convert CString to std::string, and write it to file using ofstream

std::string Utils::ToString(CString& cString) {
    /* Will not work correctly, if we are compiled under unicode mode. */
    return (LPCTSTR)cString;
}

To my surprise. It just works. I thought I need to at least make use of wstring. I try to do some investigation.

Here is the MBCS.txt generated.

alt text http://sites.google.com/site/yanchengcheok/Home/stackoverflow0.PNG

I try to print a single character named 脚 (its value is 0xBDC5)
When I use CString to carry this character, its length is 2.
When I use Utils::ToString to perform conversion to std::string, the returned string length is 2.
I write to file using std::ofstream

My question is :

When I exam MBCS.txt using a hex editor, the value is displayed as BD (LSB) and C5 (MSB). But I am using little endian machine. Isn’t hex editor should show me C5 (LSB) and BD (MSB)? I check from wikipedia. GB2312 seems doesn’t specific endianness.
It seems that using std::string + CString just work fine for my case. May I know in what case, the above methodology will not work? and when I should start to use wstring?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T00:44:58+00:00

About 1. Endianness is a problem you meet when you serialize a unit in term of smaller units (i.e. serialize seizets in term of octets). I’m far from being a specialist of CJK encodings, but it seems to me that GB2112 is a coded character set which can be used with several encoding schemes. The encoding schemes cited in the wikipedia page as being used for GB2112 (ISO 2022, EUC-CN and HZ) are all defined in terms of octets. So there is no endianness issue if serialized as octets.

Contrast this with Unicode encoding schemes: UTF-8 is defined in terms of octets and has no endianness issue when serialized as octets, UTF-16 is defined in terms of seizets and if serialized as octets endianness must be specified, UTF-32 is defined in terms of 32 bits units and if serialized as octets endianness must be specified.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Currently, I am developing an app for a China customer. China customer are mostly

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply