To scale through the center you can add in your…

Question

0

Asked: May 10, 20262026-05-10T18:16:53+00:00 2026-05-10T18:16:53+00:00

What is the difference between UTF and UCS. What are the best ways to

0

What is the difference between UTF and UCS.

What are the best ways to represent not European character sets (using UTF) in C++ strings. I would like to know your recommendations for:

Internal representation inside the code
- For string manipulation at run-time
- For using the string for display purposes.
Best storage representation (i.e. In file)
Best on wire transport format (Transfer between application that may be on different architectures and have a different standard locale)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T18:16:53+00:00

What is the difference between UTF and UCS.

UCS encodings are fixed width, and are marked by how many bytes are used for each character. For example, UCS-2 requires 2 bytes per character. Characters with code points outside the available range can’t be encoded in a UCS encoding.

UTF encodings are variable width, and marked by the minimum number of bits to store a character. For example, UTF-16 requires at least 16 bits (2 bytes) per character. Characters with large code points are encoded using a larger number of bytes — 4 bytes for astral characters in UTF-16.

Internal representation inside the code

Best storage representation (i.e. In file)

Best on wire transport format (Transfer between application that may be on different architectures and have a different standard locale)

For modern systems, the most reasonable storage and transport encoding is UTF-8. There are special cases where others might be appropriate — UTF-7 for old mail servers, UTF-16 for poorly-written text editors — but UTF-8 is most common.

Preferred internal representation will depend on your platform. In Windows, it is UTF-16. In UNIX, it is UCS-4. Each has its good points:

UTF-16 strings never use more memory than a UCS-4 string. If you store many large strings with characters primarily in the basic multi-lingual plane (BMP), UTF-16 will require much less space than UCS-4. Outside the BMP, it will use the same amount.
UCS-4 is easier to reason about. Because UTF-16 characters might be split over multiple ‘surrogate pairs’, it can be challenging to correctly split or render a string. UCS-4 text does not have this issue. UCS-4 also acts much like ASCII text in ‘char’ arrays, so existing text algorithms can be ported easily.

Finally, some systems use UTF-8 as an internal format. This is good if you need to inter-operate with existing ASCII- or ISO-8859-based systems because NULL bytes are not present in the middle of UTF-8 text — they are in UTF-16 or UCS-4.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions