I was writing a CLI-Tool for Mac OS X (10.5+) that has to deal with command-line arguments which are very likely to contain non-ASCII characters.
For further processing, I convert these arguments using +[NSString stringWithCString:encoding:].
My problem is, that I couldn’t find good information on how to determine the character-encoding used by the shell in which said cli-tool is running in.
What I came up with as a solution is the following:
NSDictionary *environment = [[NSProcessInfo processInfo] environment];
NSString *ianaName = [[environment objectForKey:@"LANG"] pathExtension];
NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding(
CFStringConvertIANACharSetNameToEncoding( (CFStringRef)ianaName ) );
NSString *someArgument = [NSString stringWithCString:argv[someIndex] encoding:encoding];
I find that a little crude, however — which makes me think that I missed out something obvious…but what?
Is there a saner/cleaner way of achieving essentially the same?
Thanks in advance
D
Okay, it turns out there seems to be none!
As Yuji pointed out, the underlying encoding of filenames is UTF-8, no matter what. Therefore, one needed to handle two scenarios:
ls, as they do not convert any characters.The second case is simply covered by the assumption of UTF-8.
The first case, however, is problematic:
de_DE.IANA_NAME.I didn’t test each and every charset I could think of, but none of the european ones were included. Instead, $LANG only was the language-locale (
de_DEin my case)!Since the results of calling
+[NSString stringWithCString:encoding:]with an incorrect encoding are undefined, you cannot safely assume that it will returnnilin that case* (if eg. it’s ASCII-only, it might work perfectly fine!).What adds to the overall mess is that
$LANGis not guarateed to be around, anyway: There’s a checkbox in Terminal.app’s preferences, that enables a user to not set$LANGat all (not to speak of X11.app which doesn’t seem to handle any non-ASCII input…).So what’s left:
$LANG. If it’s not set, Goto:4!$LANGcontains information on the encoding. If it doesn’t, Goto:4!argcis greater than 2 and[[NSString stringWithCString: argv[0] encoding: NSUTF8StringEncoding] isEqualToString: yourForceUTFArgumentFlag], print that you are forcing UTF-8 now and Goto 6. If not:yourForceUTFArgumentFlagas the first argument and exit().Sounds shitty? That’s because it is, but I can’t think of any saner way of doing it.
One further note though:
If you are using UTF-8 as an encoding, stringWithCString:encoding: returns nil whenever it encounters non-ASCII characters in a C-String that is not encoded in UTF-8.)