The one thing that comes to mind is that you're…

Question

0

Asked: May 14, 20262026-05-14T22:44:54+00:00 2026-05-14T22:44:54+00:00

I’m confused. Consider this code working the way I expect: >>> foo = u’Émilie

0

I’m confused. Consider this code working the way I expect:

>>> foo = u'Émilie and Juañ are turncoats.'
>>> bar = "foo is %s" % foo
>>> bar
u'foo is \xc3\x89milie and Jua\xc3\xb1 are turncoats.'

And this code not at all working the way I expect:

>>> try:
...     raise Exception(foo)
... except Exception as e:
...     foo2 = e
... 
>>> bar = "foo2 is %s" % foo2
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Can someone explain what’s going on here? Why does it matter whether the unicode data is in a plain unicode string or stored in an Exception object? And why does this fix it:

>>> bar = u"foo2 is %s" % foo2
>>> bar
u'foo2 is \xc3\x89milie and Jua\xc3\xb1 are turncoats.'

I am quite confused! Thanks for the help!

UPDATE: My coding buddy Randall has added to my confusion in an attempt to help me! Send in the reinforcements to explain how this is supposed to make sense:

>>> class A:
...     def __str__(self): return "string"
...     def __unicode__(self): return "unicode"
... 
>>> "%s %s" % (u'niño', A())
u'ni\xc3\xb1o unicode'
>>> "%s %s" % (A(), u'niño')
u'string ni\xc3\xb1o'

Note that the order of the arguments here determines which method is called!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T22:44:54+00:00

The Python Language Reference has the answer:

If format is a Unicode object, or if any of the objects being converted using the %s conversion are Unicode objects, the result will also be a Unicode object.

foo = u'Émilie and Juañ are turncoats.'
bar = "foo is %s" % foo

This works, because foo is a unicode object. This causes the above rule to take effect and results in a Unicode string.

bar = "foo2 is %s" % foo2

In this case, foo2 is an Exception object, which is obviously not a unicode object. So the interpreter tries to convert it to a normal str using your default encoding. This, apparently, is ascii, which cannot represent those characters and bails out with an exception.

bar = u"foo2 is %s" % foo2

Here it works again, because the format string is a unicode object. So the interpreter tries to convert foo2 to a unicode object as well, which succeeds.

As to Randall’s question: this surprises me too. However, this is according to the standard (reformatted for readability):

%s converts any Python object using str(). If the object or format provided is a unicode string, the resulting string will also be unicode.

How such a unicode object is created is left unclear. So both are legal:

call __str__, decode back to a Unicode string, and insert it into the output string
call __unicode__ and insert the result directly into the output string

The mixed behaviour of the Python interpreter is rather hideous indeed. I would consider this to be a bug in the standard.

Edit: Quoting the Python 3.0 changelog, emphasis mine:

Everything you thought you knew about binary data and Unicode has changed.

[…]

As a consequence of this change in philosophy, pretty much all code that uses Unicode, encodings or binary data most likely has to change. The change is for the better, as in the 2.x world there were numerous bugs having to do with mixing encoded and unencoded text.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions