I really expect that the byte data below should show differently, but in fact,

Question

0

Asked: June 13, 20262026-06-13T01:39:25+00:00 2026-06-13T01:39:25+00:00

I really expect that the byte data below should show differently, but in fact,

0

I really expect that the byte data below should show differently, but in fact, they are same, according to wiki http://en.wikipedia.org/wiki/UTF-8#Examples , the encoding in byte look different, but why Java print them out as the same?

    String a = "€";
    byte[] utf16 = a.getBytes(); //Java default UTF-16
    byte[] utf8 = null;

    try {
        utf8 = a.getBytes("UTF-8");
    } catch (UnsupportedEncodingException e) {
        throw new RuntimeException(e);
    }

    for (int i = 0 ; i < utf16.length ; i ++){
        System.out.println("utf16 = " + utf16[i]);
    }

    for (int i = 0 ; i < utf8.length ; i ++){
        System.out.println("utf8 = " + utf8[i]);
    }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T01:39:26+00:00

Although Java holds characters internally as UTF-16, when you convert to bytes using String.getBytes(), each character is converted using the default platform encoding which will likely be something like windows-1252. The results I’m getting are:

utf16 = -30
utf16 = -126
utf16 = -84
utf8 = -30
utf8 = -126
utf8 = -84

This indicates that the default encoding is “UTF-8” on my system.

Also note that the documentation for String.getBytes() has this comment: The behavior of this method when this string cannot be encoded in the default charset is unspecified.

Generally, though, you’ll avoid confusion if you always specify an encoding like you do with a.getBytes("UTF-8")

Also, another thing that can cause confusion is including Unicode characters directly in your source file: String a = "€";. That euro symbol has to be encoded to be stored as one or more bytes in a file. When Java compiles your program, it sees those bytes and decodes them back into the euro symbol. You hope. You have to be sure that the software that save the euro symbol into the file (Notepad, eclipse, etc) encodes it the same way as Java expects when it reads it back in. UTF-8 is becoming more popular but it is not universal and many editors will not write files in UTF-8.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I really expect that the byte data below should show differently, but in fact,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply