I have an NSString and a unicode code point range that represents a specific section of the text in that NSString. Since the characters in that NSString do not correspond one-to-one with code points, I need to somehow convert my code point range into the corresponding character range. How do I do this?
I know I can use the NSString method -rangeOfComposedCharacterSequencesForRange: to convert a character range to a grapheme cluster range, but what I want to do is sort of the opposite of that, and I can’t find an inverse of that method in the APIs. And even if there was such a method available, I don’t think this is exactly what I’m looking for, since (if I understand this correctly) a grapheme cluster is not the same thing as a unicode code point, and can in fact be composed of more than one code point.
What you have is kind of mixed data from two different worlds. You might typically get a Unicode code point range along with a UTF-32 string (where the correspondence is one-to-one) so that extracting the substring would be trivial. You have two options:
I assume from your question that #2 is the easiest option in your case.
As you say, characters in an NSString do not correspond one-to-one with Unicode code points since an NSString character is a UTF-16 unit. However, a Unicode code point corresponds to exactly 1 or 2 characters in an NSString. You can fairly easily write your own range conversion routine by iterating through the NSString characters and counting Unicode code points. This is made somewhat easier by the fact that you don’t even care about the endianness of the UTF-16 data since valid BMP characters, lead surrogates, and trail surrogates are disjoint. CFString provides some functions to determine what each character is. So in pseudocode you counting would look like: