Hi all the code is as follows: File file2 = new File(D://deploy//body.txt); byte[] bytes

Question

0

Asked: May 23, 20262026-05-23T06:13:24+00:00 2026-05-23T06:13:24+00:00

Hi all the code is as follows: File file2 = new File(D://deploy//body.txt); byte[] bytes

0

Hi all the code is as follows:

File file2 = new File("D://deploy//body.txt");

byte[] bytes = loadFile(file2);
System.out.println(bytes.length);

StringBuffer buffer = new StringBuffer();
InputStream inputStream = new ByteArrayInputStream(bytes);
InputStreamReader reader = new InputStreamReader(inputStream,"CP1252");
Reader in = new BufferedReader(reader);
int ch;
while ((ch = in.read()) > -1) {
    buffer.append((char)ch);
}
in.close(); 
System.out.println(buffer.toString().getBytes().length);

The final result is 1576 and 2439 for the length of the byte arrays. What is a proper way of converting a CP1252 byte array to a string and retain the proper size? Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T06:13:25+00:00

I noticed your phrase – “proper string”, and would like to point out that there is no such thing as a proper or improper string in your case. It’s the encoding that is either proper or improper.

You’re reading the byte sequence of cp1252 bytes, and appending the individual characters into a buffer. If the original file is in cp1252, there are no problems with this process. Under the hood, the InputStreamReader employs a CharsetDecoder that is capable of decoding the underlying charset of the stream, into a sequence of sixteen-bit Unicode characters (UTF-16). This is done, because you are reading characters from the byte stream.

As pointed out by bmargulies, when you execute buffer.toString().getBytes() you are transforming these sequences of UTF-16 characters into a byte sequence that has the same encoding as the platform. Since this is not cp1252, the lengths of the original byte array and the transformed one are not comparable. Specifying the charset to the getBytes() method causes a StringEncoder (this is an internal class with the Oracle/Sun JVM; other implementations might use a different class) to be used, to transform the UTF-16 character sequence to the sequence of bytes in the desired encoding (cp1252).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Hi all the code is as follows: File file2 = new File(D://deploy//body.txt); byte[] bytes

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply