I’m working on a Python script that reads an XML file encoded with UTF-8,

Question

0

Asked: May 19, 20262026-05-19T11:03:41+00:00 2026-05-19T11:03:41+00:00

I’m working on a Python script that reads an XML file encoded with UTF-8,

0

I’m working on a Python script that reads an XML file encoded with UTF-8, does some manipulation with it and saves it to Google’s Datastore (it’s an App Engine program).

The way I’m reading and parsing the files is just with file.readline() and a few regular expressions. The only problem is that the file I’m working with has characters from a lot of different languages in it, so for example, it might have an é or Å or Russian or Greek characters.

I was getting an error like this at first: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xd0 in position 0: ordinal not in range(128).” I then tried switching the encoding on the file open to “ISO-8859-15” which gets rid of the error but the outputted characters aren’t displayed right.

So my question is: how to work with a file encoded in UTF-8 in Python without Python getting stuck on all of the special characters in the file? I hope this was clear enough, and thanks in advance for any advice.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T11:03:41+00:00

Editorial Team

2026-05-19T11:03:41+00:00Added an answer on May 19, 2026 at 11:03 am

Specify the UTF-8 encoding on str.decode

>>> print '\xe2\x99\x9e'.decode('utf-8')
♞

That’s supposed to be a chess piece but it’s too tiny to see 🙂

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a Python script that reads an XML file encoded with UTF-8,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply