My app downloads a file in UTF-8 format, which is too large to read using the NSString initWithContentsOfFile method. The problem I have is that the NSFileHandle readDataOfLength method reads a specified number of bytes, and I may end up only reading part of a UTF-8 character. What is the best solution here?
LATER:
Let it be recorded in the ship’s log that the following code works:
NSData *buf = [NSData dataWithContentsOfFile:path
options:NSDataReadingMappedIfSafe
error:nil];
NSString *data = [[[NSString alloc]
initWithBytesNoCopy:(void *)buf.bytes
length:buf.length
encoding:NSUTF8StringEncoding
freeWhenDone:NO] autorelease];
My main problem was actually to do with the encoding, not the task of reading the file.
You can use
NSData +dataWithContentsOfFile:options:error:with theNSDataReadingMappedIfSafeoption to map your file to memory rather than loading it. So that’ll use the virtual memory manager in iOS to ensure that bits of the file are swapped in and out of RAM in the same way that a desktop OS handles its on-disk virtual memory file. So you don’t need enough RAM to keep the entire file in memory at once, you just need the file to be small enough to fit in the processor’s address space (so, gigabytes). You’ll get an object that acts exactly like a normalNSData, which should save you most of the hassle related to using anNSFileHandleand manually streaming.You’ll probably then need to convert portions to
NSStringsince you can realistically expect that to convert from UTF-8 to another format (though it might not; it’s worth having a go with-initWithData:encoding:and seeing whether NSString is smart enough just to keep a reference to the original data and to expand from UTF-8 on demand), which I think is what your question is really getting at.I’d suggest you use
-initWithBytes:length:encoding:to convert a reasonable number of bytes to a string. You can then use-lengthOfBytesUsingEncoding:to find out how many bytes it actually made sense of and advance your read pointer appropriately. It’s a safe assumption thatNSStringwill discard any part characters at the end of the bytes you provide.EDIT: so, something like:
Of course, an implicit assumption is that all UTF-8 encodings are unique, which I have to admit not to being knowledgable enough to say for absolute certain.