Why does a byte in Java I/O can represent a character?
And I see the characters are only ASCII. Then it’s not dynamic, right?
Is there any explanation for this?
What is the difference between byte streams and character streams?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Bytes are not characters. Alone, they can’t even represent characters.
Computingwise, a “character” is a pairing of a numeric code (or sequence of codes) with an encoding or character set that defines how the codes map to real-world characters (or to whitespace, or to control codes).
Only once paired with an encoding can bytes represent characters. With some encodings (like ASCII or ISO-8859-1), one byte can represent one character…and many encodings are even ASCII-compatible (meaning that the character codes from 0 to 127 align with ASCII’s definition for them)…but without the original mapping, you don’t know what you have.
Without an encoding, bytes are just 8-bit integers.
You can interpret them any way you like by forcing an encoding onto them. That is exactly what you’re doing when you convert a
bytetochar, saynew String(myBytes), etc, or even edit a file containing the bytes in a text editor. (In that case, it’s the editor applying the encoding.) In doing so, you might even get something that makes sense. But without knowing the original encoding, you can’t know for sure what those bytes were intended to represent.It might not even be text.
For example, consider the byte sequence
0x48 0x65 0x6c 0x6c 0x6f 0x2e. It can be interpreted as:Hello.in ASCII and compatible 8-bit encodings;dinnerin some 8-bit encoding i made up just to prove this point;䡥汬漮in big-endian UTF-16*;load r101, [0x6c6c6f2e]in some unknown processor’s assembly language;or any of a million other things. Those six bytes alone can’t tell you which interpretation is correct.
With text, at least, that’s what encodings are for.
But if you want the interpretation to be right, you need to use the same encoding to decode those bytes as was used to generate them. That’s why it’s so important to know how your text was encoded.
The difference between a byte stream and a character stream is that the character stream attempts to work with characters rather than bytes. (It actually works with UTF-16 code units. But since we know the encoding, that’s good enough for most purposes.) If it’s wrapped around a byte stream, the character stream uses an encoding to convert the bytes read from the underlying byte stream to
chars (orchars written to the stream to bytes).* Note: I don’t know whether “䡥汬漮” is profanity or even makes any sense…but neither does a computer unless you program it to read Chinese.