mystr = 'aaaa'
myvar = u'My string %s' % str(mystr)
Can this be a problem in the future? I’m messing up woth some in-house code that uses email modules in Python and found some code like this. mystr will always have only ascii characters since it comes from a list with pre defined ascii only characters.
I didn’t write the code, and having str(mystr) or mystr doesn’t change the matter of the question.
Doing the first snippet I’m going to have a safe unicode object, or do I have to do
mystr = u'aaaa'
myvar = u'My string %s' % mystr
or
mystr = 'aaaa'
myvar = u'My string %s' % unicode(mystr)
?
(I know this is not the correct way of doing, I know I should handle the exceptions, I’m asking here only if the first snippet returns a valid unicode object, or if Python mess up with it’s internals or something when doing it.)
Try putting actual unicode symbols in the strings (like umlauts or cyrillic) and watch hell breaking lose. 🙂
The problem is that you will most likely code your application and on a bright shiny day some Russian or German will write her name and will suddenly get an
Internal Server Errorfor having a non-ascii symbol in her name.No, there will be no problem. And IMHO this is a fault in Python, because this is bug, waiting to bite. This should have been a fatal error, but because of historical reasons, I guess, it isn’t.