If I read a unicode file using the following:
f = open(r'file.txt','rU')
raw = f.read()
how can I cause the file to be read as extended ascii, that is convert \xc3\xaa to ê correctly and convert all non-displayable characters to a default character (say ?).
I also have the following:
# Create a file called sitecustomize.py in c:\python27\Lib\site-packages.
import sys
sys.setdefaultencoding('iso-8859-1')
which I’m not sure whether I need to change.
For some reason I can’t paste ê into the python console (dos in windows) put I can do:
>>> s = u'La Pe\xf1a'
>>> print s
La Peña
Anybody have any idea how to do this?
in python2
in py3 just
To clear up confusion, there’s no such thing as “unicode file”. Unicode is a mathematical abstraction and files are bytes on your disc. In order to convert these bytes to an internal memory representation of unicode codepoints, python needs to know how to interpret them. This interpretation is called “encoding” and from your post you appear to use “utf8”. So you have to tell that to python.