How do I configure a minidom instance to have a unicode codec?
Processing this snippet in an XML file:
<title type="text">ME850单片机开发实验仪(增强配置)(产品浏览)-伟纳电子-http://www.willar.com/</title>
produces this error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-12: ordinal not in range(128)
update: works as expected in Python 3; apparently this is a known limitation of 2.x.
In Python 2.x, minidom can only parse byte strings. Either don’t decode your document in the first place (hint: a more specific suggestion would require the code you’re running), or encode it into UTF-8.
Alternatively, you can switch to Python 3.x, where minidom can handle bytes as well as character strings.