I have a string say s = 'Chocolate Moelleux-M\xe8re' When i am doing:
In [14]: unicode(s)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 20: ordinal not in range(128)
Similarly when i am trying to decode this by using s.decode() it returns same error.
In [13]: s.decode()
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 20: ordinal not in range(128)
How to decode such string into unicode.
I have had to face this problem one too many times. The problem that I had contained strings in different encoding schemes. So I wrote a method to decode a string heuristically based on certain features of different encodings.
To add to this this link gives a good feedback on why encoding etc – Why we need sys.setdefaultencoging in py script