If run this code in console – it works well (it is in Russian), but if run it like cgi on Apache2 server – it fails: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode characters in position 8-9: ordinal not in range(128). The code is:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import cgitb
cgitb.enable()
print "Content-Type: text/html;charset=utf-8"
print
s=u'Nikolja \u043d\u0435 \u0421\u0430\u0440\u043a\u043e\u0437\u0438!'
print s#.encode('utf-8')
Yes, solution is to uncomment .encode('utf-8'), but i spend more time to understand why than happens and i cant see the answer.
When running from the console Python can detect the encoding of the console and implicitly converts Unicode printed to the console to that encoding. It can still fail if that encoding doesn’t support the characters you are trying to print. UTF-8 can support all Unicode characters, but other common console encodings like cp437 on US Windows don’t.
When stdout is not a console, Python 2.X defaults to ASCII when it can’t determine a console encoding. That’s why in a web sever you have to be explicit and encode your output yourself.
As an example, try the following script from a console and from your webserver:
From the console you should get some encoding, but from the web server you should get
None. Note that Python 2.X usesasciibut Python 3.X usesutf-8when the encoding cannot be determined.The problem can also occur at a console when redirecting output. This script:
returns the following when run directly vs. redirecting
stdout:Note
stderrwasn’t affected since it wasn’t redirected.The environment variable
PYTHONIOENCODINGcan be used to override the default stdout/stdin encoding as well.