A very common source of encoding errors is that python 2 will silently coerce strings to unicode when you add them together with unicode. This can cause mixed encoding problems and can be very hard to debug.
For example:
import urllib
import webbrowser
name = raw_input("What's your name?\nName: ")
greeting = "Hello, %s" % name
if name == "John":
greeting += u' (Feliz cumplea\xf1os!)'
webbrowser.open('http://lmgtf\x79.com?q=' + urllib.quote_plus(greeting))
will fail with a cryptic error if you enter “John”:
/usr/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison faile
d to convert both arguments to Unicode - interpreting them as being unequal
return ''.join(map(quoter, s))
Traceback (most recent call last):
File "feliz.py", line 7, in <module>
webbrowser.open('http://lmgtf\x79.com?q=' + urllib.quote_plus(greeting))
File "/usr/lib/python2.7/urllib.py", line 1273, in quote_plus
s = quote(s, safe + ' ')
File "/usr/lib/python2.7/urllib.py", line 1268, in quote
return ''.join(map(quoter, s))
KeyError: u'\xf1'
It’s particularly hard to track down when the actual errors come far down the road from where the actual coercion happened.
How can you configure python to give a warning or exception immediately when strings are coerced to unicode?
I did a little more research after asking this question and hit on the perfect answer. Armin Ronacher created a wonderful little tool called unicode-nazi. Just install it and run your program like this:
and you get a traceback right where the coercion happened:
If you’re dealing with python libraries that trigger implicit coercions themselves and you can’t catch the exceptions or otherwise work around them, you can leave out the
-Werror:and at least see a warning printed out on stderr when it happens: