The following code causes the well-known “UnicodeDecodeError: ‘ascii’ codec can’t decode” error:
import xml.sax
import io
parser = xml.sax.make_parser()
parser.parse(io.StringIO(u'<a>é</a>'))
While
import xml.sax
parser = xml.sax.make_parser()
parser.parse(open('foo'))
works (the content of file “foo” is <a>é</a>).
I need to parse an XML string in my case, not a file.
Is there any solution to my problem? Thanks.
A file contains bytes, and must have some encoding to store Unicode characters, so use a BytesIO object instead:
Note:
#coding: utf8specifies the encoding of the source file;.encode('utf8')specifies the encoding of the Unicode string to be stored in theBytesIOobject. Technically using a non-Unicode string:will work as well, since byte strings will be in the source file encoding already, but it makes the intent clearer. The source file and
BytesIOencoding could be different.