I need some help understanding the concept of a well-formed UTF-16 string as mentioned

Question

0

Asked: June 15, 20262026-06-15T11:01:29+00:00 2026-06-15T11:01:29+00:00

I need some help understanding the concept of a well-formed UTF-16 string as mentioned

0

I need some help understanding the concept of a well-formed UTF-16 string as mentioned on these two paragraphs at Chapter 2: General Structure 2.7 Unicode String:

“Depending on the programming environment, a Unicode string may or may not be required to be in the corresponding Unicode encoding form. For example, strings in Java, C#, or ECMAScript are Unicode 16-bit strings, but are not necessarily well-formed UTF-16 sequences. In normal processing, it can be far more efficient to allow such strings to contain code unit sequences that are not well-formed UTF-16—that is, isolated surrogates. Because strings are such a fundamental component of every program, checking for isolated surrogates in every operation that modifies strings can create significant overhead, especially because supplementary characters are extremely rare as a percentage of overall text in programs worldwide.

Whenever such strings are specified to be in a particular Unicode encoding form—even one with the same code unit size—the string must not violate the requirements of that encoding form. For example, isolated surrogates in a Unicode 16-bit string are not allowed when that string is specified to be well formed UTF-16.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T11:01:30+00:00

The paragraph explains it for UTF-16; not well-formed means the string contains isolated surrogate codeunits.

That is, there are certain code units which are only valid when they appear in pairs. A code unit in the range [0xD800-0xDFFF] must occur only in pairs where the first must be in the range [0xD800-0xDBFF] and the second must be in the range [0xDC00-0xDFFF]. If a string does not obey this requirement then it is not well-formed.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need some help understanding the concept of a well-formed UTF-16 string as mentioned

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply