I was reading the book Programming in C by Stephen G. Kochan about C programming. It states that:
“if a character value is used that is not part of the standard character, its sign might be extended when converted to an integer”
And then it states
“C language permits character variable to be declared unsigned, this avoiding this potential problem”
Can someone explain what problem may occur when extending the sign during conversion from char to int?
And why does this matter?
And what’s wrong with an negative integer which is converted form a char?
Thank You
Let’s say you take an innocent looking function from <ctype.h>,
isupper().It’s defined
int isupper(int c);. So it takes an int and returns an int.Now, let’s say that you’re not a very careful programmer, and you just pass you char to this function. You think to yourself: “What could go wrong? This is the simplest function I know!”.
But you’d be wrong. Somewhere, someone will have her MP3 player going into an endless crash-loop because of this terrible mistake.
And here’s why. The most annoying type in C is char. It can be signed, it can be unsigned, you can force the compiler one way or another (but then you open another can of worms), and worst of all, the standard C library uses this type everywhere!
So, you use char, but you’re not aware of the fact that it’s actually signed in your environment. You use it as if the world is an ASCII world.
But the world isn’t. And that MP3 happy owner is now listening to a famous German song whose name contains the letter ä (“extended ASCII code 132”).
You pass this character to
isupper(), and the compiler does the following horror:“Ah, it’s a character, but the function takes an integer. I know! I will not warn the programmer, because that’s too simple. I’ll just convert the character to an integer and pass it along. How do I do that? Let’s check the C standard… Hmmm… Simple, just take the value and sign-extend it (because char is signed, don’t you know?). Now, this character has the value -124, so I’ll just convert it to an int with the value -124. That was simple, I don’t see what the fuss is about. Why should I even warn the programmer?!”
And now
isupper()is called with -124 instead of 132.But what’s wrong with that? Nothing, except that the C library that comes with the compiler implements
isupper()using a simple 128-byte array: it simply returns the value at the given index. The array is initialised with 0 everywhere except for upper-case ASCII codes, where it’s 1. Such a simple and elegant implementation…But wait, what happens if you pass a negative value to this function? Well, that’s not allowed:
So, undefined behaviour. In this case, it tries to access memory that doesn’t belong to the process, and BAM! the program crashes.
So you see, char is evil and you should never use it, unless you really understand how to use it properly.
(*) As Keith Thompson said in the comment, it is of course impossible to avoid using
char. Fromstrlen()tocurl_easy_escape(), everybody useschar. But you should be aware of conversions toint, especially whencharmay hold a negative number. <ctype.h> functions and array indices are two places where it’s easy to make costly mistakes.