I’m accesing an Excel thru python to adjust some encoding of the cells. My code so far:
from xlrd import *
from xlwt import *
wb = open_workbook('a.xls')
s = wb.sheets()[0]
for row in range(s.nrows):
e = s.cell(row,9).value
r = s.cell(row,11).value
print e,' ',r.decode('cp1251')
When running this code I get this error:
Traceback (most recent call last):
File "C:\Users\pem\workspace\a\src\a.py", line 17, in <module>
print e,' ',r.decode('cp1251')
File "C:\Python27\lib\encodings\cp1251.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
File "C:\Python27\lib\encodings\cp1251.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xf6' in position 23: character maps to <undefined>
I know that e is english text and r is the russian translation in 1251 encoding.
I assume you’re using Python 2. (Unicode handling is different in Python 3.)
Use
r.decode('cp1252')to decoderin your encoding into unicode. This will give you an object of typeunicode.Note that if you try to print it, it will be first implicitly encoded, i.e. converted back to
strwith default encodingansi. If your console supports unicode, you can print it by saying:Note that Python’s
strstring consists of 8-bit bytes (characters), whileunicoderepresents an actual string where one character can be any unicode character. (In Python 3,strwas replaced bybytesandunicoderenamed tostrto make this more obvious.).decode()on astrallows you to get a “meaningful” unicode string out of some bytes (that you read from somewhere) using an encoding you specify, while.decode()on anunicodeobject does the opposite: allows you to get the byte representation of a string using an encoding of your choice.