I’m learning the C language on Linux now and I’ve came across a little weird situation.
As far as I know, the standard C’s char data type is ASCII, 1 byte (8 bits). It should mean, that it can hold only ASCII characters.
In my program I use char input[], which is filled by getchar function like this pseudocode:
char input[20];
int z, i;
for(i = 0; i < 20; i++)
{
z = getchar();
input[i] = z;
}
The weird thing is that it works not only for ASCII characters, but for any character I imagine, such as @&@{čřžŧ¶'`[łĐŧđж←^€~[←^ø{&}čž on the input.
My question is – how is it possible? It seems to be one of many beautiful exceptions in C, but I would really appreciate explanation. Is it a matter of OS, compiler, hidden language’s additional super-feature?
Thanks.
There is no magic here – The C language gives you acess to the raw bytes, as they are stored in the computer memory.
If your terminal is using utf-8 (which is likely), non-ASCII chars take more than one byte in memory. When you display then again, is our terminal code which converts these sequences into a single displayed character.
Just change your code to print the
strlenof the strings, and you will see what I mean.To properly handle utf-8 non-ASCII chars in C you have to use some library to handle them for you, like glib, qt, or many others.