Why is the below item failing? Why does it succeed with "latin-1" codec? o

Question

0

Asked: May 21, 20262026-05-21T03:18:43+00:00 2026-05-21T03:18:43+00:00

Why is the below item failing? Why does it succeed with "latin-1" codec? o

0

Why is the below item failing? Why does it succeed with "latin-1" codec?

o = "a test of \xe9 char" #I want this to remain a string as this is what I am receiving
v = o.decode("utf-8")

Which results in:

 Traceback (most recent call last):  
 File "<stdin>", line 1, in <module>  
 File "C:\Python27\lib\encodings\utf_8.py",
 line 16, in decode
     return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError:
 'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T03:18:44+00:00

In binary, 0xE9 looks like 1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example:

>>> b'\xe9\x80\x80'.decode('utf-8')
u'\u9000'

But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:

>>> u'\xe9'.encode('utf-8')
b'\xc3\xa9'
>>> u'\xe9'.encode('latin-1')
b'\xe9'

(Note, I’m using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Why is the below item failing? Why does it succeed with "latin-1" codec? o

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply