let’s say, i have a textfile called sometext.txt
it has a line – “Sic semper tyrannis” which is (correct me if i’m wrong..)
83 105 99 32 115 101 109 112 101 114 32 116 121
114 97 110 110 105 115
(in decimal ASCII)
When i read this line from file using standard library file i/o routines, i don’t perform any character encodings work.. (or do i??)
The question is:
Which software component actually converts 0s and 1s into characters(i.e. contains algorithm for converting 0s and 1s into characters)?? Is it OS component?? Which one??
It’s all a bunch of 1’s and 0’s.
An ASCII “A” is just the letter displayed when the value (01000001b, or 0x41 or 65 dec) is “encountered” (depend on context, naturally). There is no “conversion”; it’s just a different view of the same thing defined by an accepted mapping.
Unicode (and other multi-byte) character sets often use different encodings; in UTF-8 (a Unicode encoding), for instance, a single Unicode character can be mapped as 1, 2, 3 or 4 bytes depending upon the character. Unicode encoding conversion often takes place in the IO libraries that come as part of a language or runtime; however, a Unicode-aware operating system also needs to understand a Unicode encoding itself (in system calls) so the line can be blurred.
UTF-8 has the nice property that all normal ASCII characters map to a single byte which makes it the most compatible Unicode encoding with traditional ASCII.