I’m working on a Python script that reads an XML file encoded with UTF-8, does some manipulation with it and saves it to Google’s Datastore (it’s an App Engine program).
The way I’m reading and parsing the files is just with file.readline() and a few regular expressions. The only problem is that the file I’m working with has characters from a lot of different languages in it, so for example, it might have an é or Å or Russian or Greek characters.
I was getting an error like this at first: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xd0 in position 0: ordinal not in range(128).” I then tried switching the encoding on the file open to “ISO-8859-15” which gets rid of the error but the outputted characters aren’t displayed right.
So my question is: how to work with a file encoded in UTF-8 in Python without Python getting stuck on all of the special characters in the file? I hope this was clear enough, and thanks in advance for any advice.
Specify the UTF-8 encoding on
str.decodeThat’s supposed to be a chess piece but it’s too tiny to see 🙂