Odd error with unicode for me. I was dealing with unicode fine, but when I ran it this morning one item u’\u201d’ gave error and gives me
UnicodeError: ASCII encoding error: ordinal not in range(128)
I looked up the code and apparently its utf-32 but when I try to decode it in the interpreter:
c = u'\u201d'
c.decode('utf-32', 'replace')
Or any other operation with it for that matter, it just doesnt recognize it in any codec but yet I found it as “RIGHT DOUBLE QUOTATION MARK”
I get:
Traceback (most recent call last):
File "<pyshell#154>", line 1, in <module>
c.decode('utf-32')
File "C:\Python27\lib\encodings\utf_32.py", line 11, in decode
return codecs.utf_32_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 0: ordinal not in range(128)
You already have a unicode string, there is no need to decode it to a unicode string again.
What happens in that case is that python helpfully tries to first encode it for you, so that you can then decode it from
utf-32. It uses the default encoding to do so, which happens to be ASCII. Here is an explicit encode to show you the exception raised in that case:In short, when you have a unicode literal like
u'', there is no need to decode it.Read up on the unicode, encodings, and default settings in the Python Unicode HOWTO. Another invaluable article on the subject is Joel Spolsky’s Minimun Unicode knowledge post.