I have code which computes the SHA-256 hash of a String, and noticed that

Question

0

Asked: June 16, 20262026-06-16T04:08:06+00:00 2026-06-16T04:08:06+00:00

I have code which computes the SHA-256 hash of a String, and noticed that

0

I have code which computes the SHA-256 hash of a String, and noticed that I was getting different hashes from Android and Oracle Java 7 for the same string. My hashing code converts the String into byte[] with:

byte[] data = stringData.getBytes("UTF-16");

With UTF-16 encoding, I get different results from Oracle Java and Android Java. This is the string I was hashing:

// Test Code:
String toHash = "testdata";
System.out.println("Hash: " +DataHash.getHashString(toHash));

And get theses hashes with UTF-16:

Hash: a1112a0363a59097a701e38398e1fdfef3049358aee81b77ecaad2924a426bc5 [Oracle Java 7]
Hash: 811b723aee07c7a52456fc57a5683e73649075a373d341f7257bf73575111ba3 [Android 2.2]

However, with UTF-8, I get the same hash with both JREs:

Hash: 810ff2fb242a5dee4220f2cb0e6a519891fb67f2f828a6cab4ef8894633b1f50 [Oracle Java 7]
Hash: 810ff2fb242a5dee4220f2cb0e6a519891fb67f2f828a6cab4ef8894633b1f50 [Android 2.2]

Is there some kind of endian-ness issue going on which is causing the different results on the different platforms? How should I really be preparing a String to be hashed in a platform independent way?

EDIT:
Whoops, the answer is rather obvious once you read about UTF-16 a bit more. There are two versions of UTF-16 (big-endian and little-endian). You just need to specify which version getBytes() should use, and the hashes are the same. Pick one of:

UTF-16LE
UTF-16BE

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T04:08:08+00:00

According to the documentation of Orcale Java:

When decoding, the UTF-16 charset interprets a byte-order mark to
indicate the byte order of the stream but defaults to big-endian if
there is no byte-order mark; when encoding, it uses big-endian byte
order and writes a big-endian byte-order mark.

That means plain UTF-16 should always encode as Big Endian in Oracle Java.

Then from Android Java documentation:

Charset            Encoder writes
UTF-16BE           BE, no BOM
UTF-16LE           LE, no BOM
UTF-16             BE, with BE BOM

So there is a bug in either one, or in the documentation. Both must be Big Endian, and write BOM, so there shouldn’t be any difference.

In general you should prefer UTF-16BE/LE over UTF-16, but in this case it seems to be a bug.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have code which computes the SHA-256 hash of a String, and noticed that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply