For some reason, Python seems to be having issues with BOM when reading unicode

Question

Asked: May 25, 20262026-05-25T03:22:30+00:00 2026-05-25T03:22:30+00:00

For some reason, Python seems to be having issues with BOM when reading unicode strings from a UTF-8 file. Consider the following:

with open('test.py') as f:
   for line in f:
      print unicode(line, 'utf-8')

Seems straightforward, doesn’t it?

That’s what I thought until I ran it from command line and got:

UnicodeEncodeError: ‘charmap’ codec can’t encode character u’\ufeff’
in position 0: character maps to <undefined>

A brief visitation to Google revealed that BOM has to be cleared manually:

import codecs
with open('test.py') as f:
   for line in f:
      print unicode(line.replace(codecs.BOM_UTF8, ''), 'utf-8')

This one runs fine. However I’m struggling to see any merit in this.

Is there a rationale behind above-described behavior? In contrast, UTF-16 works seamlessly.

You must login to add an answer.

Need An Account,

Editorial Team · Answer 1 · 2026-05-25T03:22:31+00:00

Editorial Team

The 'utf-8-sig' encoding will consume the BOM signature on your behalf.

The Archive Base Latest Questions