Java’s default encoding is ASCII. Yes? (See my edit below)
When a textfile is encoded in UTF-8? How does a Reader know that he has to use UTF-8?
The Readers I talk about are:
FileReadersBufferedReaders fromSockets- A
ScannerfromSystem.in - …
EDIT
It turns our the encoding is depends on the OS, which means that the following is not true on every OS:
'a'== 97
You normally specify that yourself in an
InputStreamReader. It has a constructor taking the character encoding. E.g.All other readers (as far as I know) uses the platform default character encoding, which may indeed not per-se be the correct encoding (such as -cough-
CP-1252).You can in theory also detect the character encoding automatically based on the byte order mark. This distinguishes the several unicode encodings from other encodings. Java SE unfortunately doesn’t have any API for this, but you can homebrew one which can be used to replace
InputStreamReaderas in the example here above:Edit as a reply on your edit:
No, this is not true. The
ASCIIencoding (which contains 128 characters,0x00until with0x7F) is the basis of all other character encodings. Only the characters outside theASCIIcharset may risk to be displayed differently in another encoding. TheISO-8859encodings covers the characters in theASCIIrange with the same codepoints. TheUnicodeencodings covers the characters in theISO-8859-1range with the same codepoints.You may find each of those blogs an interesting read: