Can two different strings when encoded with different encodings have the same byte sequence?

Question

0

Asked: June 8, 20262026-06-08T02:46:55+00:00 2026-06-08T02:46:55+00:00

Can two different strings when encoded with different encodings have the same byte sequence?

0

Can two different strings when encoded with different encodings have the same byte sequence?
i.e. some “string one” and “string two” in the example below when encoded using two different encodings
(Cp1252 and UTF-8 are just examples) will cause the test to pass?

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

import org.junit.Assert;
import org.junit.Test;

public class EncodingTest {
    @Test
    public void test() throws UnsupportedEncodingException {
        final byte[] sequence1 = "string one".getBytes("Cp1252");
        final byte[] sequence2 = "string two".getBytes("UTF-8");
        Assert.assertTrue(Arrays.equals(sequence1, sequence2));
    }
}

A bug in my code hashes byte sequence generated from a String with JVM’s default encoding and I need to verify whether that will cause hash collisions when the code is run with different strings and different JVM file encodings (which can happen when run on Windows and Linux for example).

Since an encoding is a mapping between byte sequences and characters, I think there may be some strings and encodings that pass the above test. But just wanted to know if there are any well known examples or some good reasons for why I shouldn’t be relying on hash collisions not happening.

Thanks

PS: This is only for encodings supported by JDK 1.6 and not by some made up ones.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T02:46:56+00:00

Yes. To take a simple example, the string “¡” (the inverted exclamation mark) encoded as ISO-8859-1 and the string “Ą” (capital A with ogoned) encodes as ISO-8859-2 both become the single-byte sequence A1 (hex). It is more or less obvious that such things happen when using the very simple encodings that map characters to single bytes; otherwise they would not be different encodings. It can surely happen when more complicated encoding schemes are involved, too.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Can two different strings when encoded with different encodings have the same byte sequence?

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply