I know that BOM is used for UTF-8 files, but what about the text files where every character is 2-bytes, should I add the byte order mark to them, too?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
BOM’s were invented for UCS-2 and UTF-16, and then only later appropriated by Microsoft (and then XML) for UTF-8. Think about the name: ‘byte order mark’. UTF-8 has only one possible byte order, so it doesn’t need a BOM to reveal the order. The three-byte sequence for U+FEFF in UTF-8 has, instead, become a Unicode signature for file type sniffing.
However, early versions of the XML support in Java did not respond well to a UTF-8 BOM, in spite of the inclusion of the UTF-8 BOM in the XML standard. Further, a file with a BOM can’t be simply concatenated onto another file, because U+FEFF isn’t BOM in the middle of the file; it’s ZWNBSP.