So for the past few days I’ve been trying to learn Python in App Engine. However, I’ve been encountering a number of problems with ASCII and UTF encoding. The freshest issue is as follows:
I have the following piece of code of a simplistic chatroom from the book ‘Code in the Cloud’
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime
# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
self.user = user
self.message = msg
self.time = datetime.datetime.now()
def __str__(self):
return "%s (%s): %s" % (self.user, self.time, self.message)
Messages = []
class ChatRoomPage(webapp.RequestHandler):
def get(self):
self.response.headers["Content-Type"] = "text/html"
self.response.out.write("""
<html>
<head>
<title>MarkCC's AppEngine Chat Room</title>
</head>
<body>
<h1>Welcome to MarkCC's AppEngine Chat Room</h1>
<p>(Current time is %s)</p>
""" % (datetime.datetime.now()))
# Output the set of chat messages
global Messages
for msg in Messages:
self.response.out.write("<p>%s</p>" % msg)
self.response.out.write("""
<form action="" method="post">
<div><b>Name:</b>
<textarea name="name" rows="1" cols="20"></textarea></div>
<p><b>Message</b></p>
<div><textarea name="message" rows="5" cols="60"></textarea></div>
<div><input type="submit" value="Send ChatMessage"></input></div>
</form>
</body>
</html>
""")
# END: MainPage
# START: PostHandler
def post(self):
chatter = self.request.get("name")
msg = self.request.get("message")
global Messages
Messages.append(ChatMessage(chatter, msg))
# Now that we've added the message to the chat, we'll redirect
# to the root page, which will make the user's browser refresh to
# show the chat including their new message.
self.redirect('/')
# END: PostHandler
# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])
def main():
run_wsgi_app(chatapp)
if __name__ == "__main__":
main()
# END: Frame
It works ok in English. However, the moment I add some non-standard characters all sorts of problems start
First of all, in order for the thing to be actually able to display characters in HTML I add meta tag – charset=UTF-8″ etc
Curiously, if you enter non-standard letters, the program processes them nicely, and displays them with no issues. However, it fails to load if I enter any non-ascii letters to the web layout iteself withing the script. I figured out that adding utf-8 encoding line would work. So I added (# –– coding: utf-8 –-). This was not enough. Of course I forgot to save the file in UTF-8 format. Upon that the program started running.
That would be the good end to the story, alas….
It doesn’t work
Long story short this code:
# -*- coding: utf-8 -*-
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime
# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
self.user = user
self.message = msg
self.time = datetime.datetime.now()
def __str__(self):
return "%s (%s): %s" % (self.user, self.time, self.message)
Messages = []
class ChatRoomPage(webapp.RequestHandler):
def get(self):
self.response.headers["Content-Type"] = "text/html"
self.response.out.write("""
<html>
<head>
<title>Witaj w pokoju czatu MarkCC w App Engine</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<h1>Witaj w pokoju czatu MarkCC w App Engine</h1>
<p>(Dokladny czas Twojego logowania to: %s)</p>
""" % (datetime.datetime.now()))
# Output the set of chat messages
global Messages
for msg in Messages:
self.response.out.write("<p>%s</p>" % msg)
self.response.out.write("""
<form action="" method="post">
<div><b>Twój Nick:</b>
<textarea name="name" rows="1" cols="20"></textarea></div>
<p><b>Twoja Wiadomość</b></p>
<div><textarea name="message" rows="5" cols="60"></textarea></div>
<div><input type="submit" value="Send ChatMessage"></input></div>
</form>
</body>
</html>
""")
# END: MainPage
# START: PostHandler
def post(self):
chatter = self.request.get(u"name")
msg = self.request.get(u"message")
global Messages
Messages.append(ChatMessage(chatter, msg))
# Now that we've added the message to the chat, we'll redirect
# to the root page, which will make the user's browser refresh to
# show the chat including their new message.
self.redirect('/')
# END: PostHandler
# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])
def main():
run_wsgi_app(chatapp)
if __name__ == "__main__":
main()
# END: Frame
Fails to process anything I write in the chat application when it’s running. It loads but the moment I enter my message (even using only standard characters) I receive
File "D:\Python25\lib\StringIO.py", line 270, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 64: ordinal not in range(128)
error message. In other words, if I want to be able to use any characters within the application I cannot put non-English ones in my interface. Or the other way round, I can use non-English characters within the app only if I don’t encode the file in utf-8. How to make it all work together?
Your strings contain unicode characters, but they’re not unicode strings, they’re byte strings. You need to prefix each one with
u(as inu"foo") in order to make them into unicode strings. If you ensure all your strings are Unicode strings, you should eliminate that error.You should also specify the encoding in the
Content-Typeheader rather than a meta tag, like this:Note your life would be a lot easier if you used a templating system instead of writing HTML inline with your Python code.