I’m using Python 2.6.5 and when I run the following in the Python shell, I get:
>>> print u'Andr\xc3\xa9'
André
>>> print 'Andr\xc3\xa9'
André
>>>
What’s the explanation for the above? Given u’Andr\xc3\xa9′, how can I display the above value properly in an html page so that it shows André instead of André?
'\xc3\xa9'is the UTF-8 encoding of the unicode characteru'\u00e9'(which can also be specified asu'\xe9'). So you can useu'Andr\u00e9'oru'Andr\xe9'.You can convert from one to the other:
Note that the reason
print 'Andr\xc3\xa9'gave you the expected result is only because your system’s default encoding is UTF-8. For example, on Windows I get:As for outputting HTML, it depends on which web framework you use and what encoding you output in the HTML page. Some frameworks (e.g. Django) will convert unicode values to the correct encoding automatically, while others will require you to do so manually.