While fetching a UTF-8-encoded file over the network using the NSURLConnection class, there’s a good chance the delegate’s connection:didReceiveData: message will be sent with an NSData which truncates the UTF-8 file – because UTF-8 is a multi-byte encoding scheme, and a single character can be sent in two separate NSData
In other words, if I join all the data I get from connection:didReceiveData: I will have a valid UTF-8 file, but each separate data is not valid UTF-8 ().
I do not want to store all the downloaded file in memory.
What I want is: given NSData, decode whatever you can into an NSString. In case the last
few byte of the NSData are an unclosed surrogate, tell me, so I can save them for the next NSData.
One obvious solution is repeatedly trying to decode using initWithData:encoding:, each time truncating the last byte, until success. This, unfortunately, can be very wasteful.
If you want to make sure that you don’t stop in the middle of a UTF-8 multi-byte sequence, you’re going to need to look at the end of the byte array and check the top 2 bits.
Look at the multi-byte table in the Wikipedia entry: http://en.wikipedia.org/wiki/UTF-8