A very common source of encoding errors is that python 2 will silently coerce

Question

0

Asked: June 11, 20262026-06-11T22:45:18+00:00 2026-06-11T22:45:18+00:00

A very common source of encoding errors is that python 2 will silently coerce

0

A very common source of encoding errors is that python 2 will silently coerce strings to unicode when you add them together with unicode. This can cause mixed encoding problems and can be very hard to debug.

For example:

import urllib
import webbrowser
name = raw_input("What's your name?\nName: ")
greeting = "Hello, %s" % name
if name == "John":
    greeting += u' (Feliz cumplea\xf1os!)'
webbrowser.open('http://lmgtf\x79.com?q=' + urllib.quote_plus(greeting))

will fail with a cryptic error if you enter “John”:

/usr/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison faile
d to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
Traceback (most recent call last):
  File "feliz.py", line 7, in <module>
    webbrowser.open('http://lmgtf\x79.com?q=' + urllib.quote_plus(greeting))
  File "/usr/lib/python2.7/urllib.py", line 1273, in quote_plus
    s = quote(s, safe + ' ')
  File "/usr/lib/python2.7/urllib.py", line 1268, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xf1'

It’s particularly hard to track down when the actual errors come far down the road from where the actual coercion happened.

How can you configure python to give a warning or exception immediately when strings are coerced to unicode?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T22:45:20+00:00

I did a little more research after asking this question and hit on the perfect answer. Armin Ronacher created a wonderful little tool called unicode-nazi. Just install it and run your program like this:

python -Werror -municodenazi myprog.py

and you get a traceback right where the coercion happened:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "SITE-PACKAGES/unicodenazi.py", line 128, in <module>
    main()
  File "SITE-PACKAGES/unicodenazi.py", line 119, in main
    execfile(sys.argv[0], main_mod.__dict__)
  File "myprog.py", line 4, in <module>
    print foo()
  File "myprog.py", line 2, in foo
    return 'bar' + u'baz'
  File "SITE-PACKAGES/unicodenazi.py", line 34, in warning_decode
    stacklevel=2)
UnicodeWarning: Implicit conversion of str to unicode

If you’re dealing with python libraries that trigger implicit coercions themselves and you can’t catch the exceptions or otherwise work around them, you can leave out the -Werror:

python -municodenazi myprog.py

and at least see a warning printed out on stderr when it happens:

/SITE-PACKAGES/unicodenazi.py:119: UnicodeWarning: Implicit conversion of str to unicode
  execfile(sys.argv[0], main_mod.__dict__)
barbaz

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

A very common source of encoding errors is that python 2 will silently coerce

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply