I have an NSString and a unicode code point range that represents a specific

Question

0

Asked: June 9, 20262026-06-09T15:19:58+00:00 2026-06-09T15:19:58+00:00

I have an NSString and a unicode code point range that represents a specific

0

I have an NSString and a unicode code point range that represents a specific section of the text in that NSString. Since the characters in that NSString do not correspond one-to-one with code points, I need to somehow convert my code point range into the corresponding character range. How do I do this?

I know I can use the NSString method -rangeOfComposedCharacterSequencesForRange: to convert a character range to a grapheme cluster range, but what I want to do is sort of the opposite of that, and I can’t find an inverse of that method in the APIs. And even if there was such a method available, I don’t think this is exactly what I’m looking for, since (if I understand this correctly) a grapheme cluster is not the same thing as a unicode code point, and can in fact be composed of more than one code point.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T15:19:59+00:00

What you have is kind of mixed data from two different worlds. You might typically get a Unicode code point range along with a UTF-32 string (where the correspondence is one-to-one) so that extracting the substring would be trivial. You have two options:

Work in the UTF-32 world before you put the data into an NSString
Convert the Unicode code point range into a UTF-16 unit range

I assume from your question that #2 is the easiest option in your case.

As you say, characters in an NSString do not correspond one-to-one with Unicode code points since an NSString character is a UTF-16 unit. However, a Unicode code point corresponds to exactly 1 or 2 characters in an NSString. You can fairly easily write your own range conversion routine by iterating through the NSString characters and counting Unicode code points. This is made somewhat easier by the fact that you don’t even care about the endianness of the UTF-16 data since valid BMP characters, lead surrogates, and trail surrogates are disjoint. CFString provides some functions to determine what each character is. So in pseudocode you counting would look like:

for each NSString character {
    if (CFStringIsSurrogateHighCharacter(character) ||
        CFStringIsSurrogateLowCharacter(character))
    {
        Skip forward another character in the NSString
    }
    Increment count of Unicode code points stepped through
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an NSString and a unicode code point range that represents a specific

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply