I’m trying to recognize a BOM for UTF-8 when reading a file. Of course,

Question

0

Asked: May 25, 20262026-05-25T01:40:49+00:00 2026-05-25T01:40:49+00:00

I’m trying to recognize a BOM for UTF-8 when reading a file. Of course,

0

I’m trying to recognize a BOM for UTF-8 when reading a file. Of course, Java files like to deal with 16 bit chars, and the BOM characters are eight bit bytes.

My test code looks like:

public void testByteOrderMarks() {
    System.out.println("test byte order marks");

    byte[] bytes = {(byte) 0xEF, (byte) 0xBB, (byte) 0xBF, (byte) 'a', (byte) 'b',(byte) 'c'};
    String test = new String(bytes,  Charset.availableCharsets().get("UTF-8"));
    System.out.printf("test len: %s  value %s\n", test.length(), test);
    String three = test.substring(0,3);
    System.out.printf("len %d  >%s<\n", three.length(), three);
    for (int i = 0; i < test.length();i++) {
        byte b = bytes[i];
        char c = test.charAt(i);
        System.out.printf("b: %s %x c: %s %x\n", (char) b, b,  c, (int) c); 
    }
}

and the result is:

test byte order marks
test len: 4 value ?abc
len 3 >?ab<
b: ? ef> c: ? feff
b: ? bb c: a 61
b: ? bf c: b 62
b: a 61 c: c 63

I can’t figure out why the length of “test” is 4 and not 6.
I can’t figure out why I don’t pick up each 8 bit byte to do the comparison.

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T01:40:50+00:00

A character is a character. The Byte Order Mark is the Unicode character U+FEFF. In Java it is the character '\uFEFF'. There is no need to delve into bytes. Just read the first character of the file, and if it matches '\uFEFF' it is the BOM. If it doesn’t match then the file was written without a BOM.

private final static char BOM = '\uFEFF';    // Unicode Byte Order Mark
String firstLine = readFirstLineOfFile("filename.txt");
if (firstLine.charAt(0) == BOM) {
    // We have a BOM
} else {
    // No BOM present.
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to recognize a BOM for UTF-8 when reading a file. Of course,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply