I understand that Java character streams wrap byte streams such that the underlying byte stream is interpreted as per the system default or an otherwise specifically defined character set.
My systems default char-set is UTF-8.
If I use a FileReader to read in a text file, everything looks normal as the default char-set is used to interpret the bytes from the underlying InputStreamReader. If I explicitly define an InputStreamReader to read the UTF-8 encoded text file in as UTF-16, everything obviously looks strange. Using a byte stream like FileInputStream and redirecting its output to System.out, everything looks fine.
So, my questions are;
-
Why is it useful to use a character stream?
-
Why would I use a character stream instead of directly using a byte stream?
-
When is it useful to define a specific char-set?
Code that deals with strings should only “think” in terms of text – for example, reading an input source line by line, you don’t want to care about the nature of that source.
However, storage is usually byte-oriented – so you need to create a conversion between the byte-oriented view of a source (encapsulated by
InputStream) and the character-oriented view of a source (encapsulated byReader).So a method which (say) counts the lines of text in an input source should take a
Readerparameter. If you want to count the lines of text in two files, one of which is encoded in UTF-8 and one of which is encoded in UTF-16, you’d create anInputStreamReaderaround aFileInputStreamfor each file, specifying the appropriate encoding each time.(Personally I would avoid
FileReadercompletely – the fact that it doesn’t let you specify an encoding makes it useless IMO.)