Using python 3.2, I am trying to decode bytes using str(bytes, “cp1251”) but I get this error:
Traceback (most recent call last):
File "C:\---\---\---\---.py", line 4, in <module>
writetemp.write(str(f.read(), "cp1251"))
File "C:\Python32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 19-25: character maps to <undefined>
As you can see, I specified “cp1251”, but it attempts to use “cp1252.py” to decode instead of “cp1251.py”, which causes the error, I think. Same thing occurs if I try “Windows-1251” instead of “cp1251”.
Note how what you’re getting is a
UnicodeEncodeError, not aUnicodeDecodeError. The error doesn’t come from yourstr(f.read(), "cp1251")call. Instead, it comes from thewritetemp.write()call.The
str()call decodes the bytes you get fromf.read()usingcp1251as the encoding. That works. That gives you a string (which is unicode, in Python 3.)writetemp.write()then has to turn the string back into bytes, by encoding it. It does that using the encoding you passed when openingwritetemp, or the default IO encoding (which Python tries to guess at based on various things.) You can see which encoding that is by looking at theencodingattribute of the file object. You’ll probably find it iscp1252. If you want to write in a particular encoding, don’t rely on Python guessing at it; explicitly specify the encoding when you open the file.