I have read through the C11 standard, section 7.21 where <stdio.h> is described. The standard first describes streams as:
7.21.2.2:
A text stream is an ordered sequence of characters …
7.21.2.3:
A binary stream is an ordered sequence of characters …
Which doesn’t specify the type of the stream characters (since this depends on orientation). It later says:
7.21.3.12:
… The byte output functions write characters to the stream as if by successive calls to the fputc function.
From fputc (7.21.7.3.2):
The
fputcfunction writes the character specified byc(converted to anunsigned char) to the output stream pointed to bystream…
Which indicates the int c argument of fputc is converted to an unsigned char before being written to the stream. A similar note is given for fgetc:
7.21.7.1.2:
the
fgetcfunction obtains that character as anunsigned charconverted to anint
and ungetc, fread and fwrite.
Now this all hints that internally, a byte oriented stream is represented by unsigned chars.
However, looking at the internals of the Linux kernel, it seems like files are considered to be streams of char. One reason I am saying this is that the file_operations read and write callbacks get char __user * and const char __user * respectively.
In the implementation of glibc, FILE is a typedef of struct _IO_FILE which is defined in libio/libio.h. In this struct also, all read and write pointers are char *.
In C++, the basic_ostream::write function takes const char * as input and similarly basic_istream::read (but I’m not interested in C++ in this question).
My question is, do the quotes above imply that FILE streams should be threated as streams of unsigned char? If so, why does the glibc and the Linux kernel implement them with char *? If not, why does the standard insist on converting the characters to unsigned char?
It doesn’t really matter. The standard use unsigned char at some chosen place because it allows precise formulation at those places:
fgetcis specified to return a unsigned char converted to an int so that one knows that the result is positive or null excepted when it is EOF (and thus there is no confusion possible between EOF and a valid char, confusion which is cause of bugs when one store directly the result of fgetc in a char without checking for EOF beforehand).fputcis specified to take an int and convert it to an unsigned char because this conversion is well specified. If you aren’t careful, formulation not using unsigned char could make UB a sequence likewith signed char for negative chars.