In the C standard library functions, the elements of the strings are chars. Is there a good reason why it was decided instead of unsigned char ?
Using unsigned char for 8-bit strings has some, albeit small advantages:
- it is more intuitive, as we usually memorize ASCII codes as unsigned values, and when working on binary data, we prefer the range 0x00 to 0xFF, unsigned, instead of dealing with negative numbers. So we have to cast.
- working with unsigned integers might be faster/more effective, or generate smaller code on some processors.
C provides three different character types:
charrepresents a character (which C also calls a “byte”).unsigned charrepresents a byte-sized pattern of bits, or an unsigned integer.signed charrepresents a byte-sized signed integer.It is implementation-defined whether
charis a signed or an unsigned type, so I think the question amounts to either “why doescharexist at all as this maybe-signed type?” or “why doesn’t C requirecharto be unsigned?”.The first thing to know is that Ritchie added the “char” type to the B language in 1971, and C inherited it from there. Prior to that, B was word-oriented rather than byte-oriented (so says the man himself, see “The Problems of B”.)
With that done, the answer to both of my questions might be that early versions of C didn’t have unsigned types.
Once
charand the string-handling functions were established, changing them all tounsigned charwould be a serious breaking change (i.e. almost all existing code would stop working), and one of the ways C has tried to cultivate its user-base over the decades is by mostly avoiding catastrophic incompatible changes. So it would be surprising for C to make that change.Given that
charis going to be the character type, and that (as you observe) it makes a lot of sense for it to be unsigned, but that plenty of implementations already existed in which char was signed, I suppose that making the signedness of char implementation-defined was a workable compromise — existing code would continue working. Provided that it was usingcharonly as a character and not for arithmetic or order comparisons, it would also be portable to implementations wherecharis unsigned.Unlike some of C’s age-old implementation-defined variations, implementers do still choose signed characters (Intel). The C standard committee cannot help but observe that some people seem to stick with signed characters for some reason. Whatever those people’s reasons are, current or historical, C has to allow it because existing C implementations rely on it being allowed. So forcing
charto be unsigned is far lower on the list of achievable goals than forcingintto be 2’s complement, and C hasn’t even done that.A supplementary question is “why does Intel still specify
charto be signed in its ABIs?”, to which I don’t know an answer but I’d guess that they’ve never had an opportunity to do otherwise without massive disruption. Maybe they even like them.