I’m using Django to manage a Postgres database. I have a value stored in the database representing a city in Spain (Málaga). My Django project uses unicode strings for everything by putting from __future__ import unicode_literals at the beginning of each of the files I created.
I need to pull the city information from the database and send it to another server using an XML request. There is logging in place along the way so that I can observe the flow of data. When I try and log the value for the city I get the following traceback:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1: ordinal not in range(128)
Here is the code I use to log the values I’m passing.
def createXML(self, dict):
"""
.. method:: createXML()
Create a single-depth XML string based on a set of tuples
:param dict: Set of tuples (simple dictionary)
"""
xml_string = ''
for key in dict:
self.logfile.write('\nkey = {0}\n'.format(key))
if (isinstance(dict[key], basestring)):
self.logfile.write('basestring\n')
self.logfile.write('value = {0}\n\n'.format(dict[key].decode('utf-8')))
else:
self.logfile.write('value = {0}\n\n'.format(dict[key]))
xml_string += '<{0}>{1}</{0}>'.format(key, dict[key])
return xml_string
I’m basically saving all the information I have in a simple dictionary and using this function to generate an XML formatted string – this is beyond the scope of this question.
The error I am getting had me wondering what was actually being saved in the database. I have verified the value is utf-8 encoded. I created a simple script to extract the value from the database, decode it and print it to the screen.
from __future__ import unicode_literals
import psycopg2
# Establish the database connection
try:
db = psycopg2.connect("dbname = 'dbname' \
user = 'user' \
host = 'IP Address' \
password = 'password'")
cur = db.cursor()
except:
print "Unable to connect to the database."
# Get database info if any is available
command = "SELECT state FROM table WHERE id = 'my_id'"
cur.execute(command)
results = cur.fetchall()
state = results[0][0]
print "my state is {0}".format(state.decode('utf-8'))
Result: my state is Málaga
In Django I’m doing the following to create the HTTP request:
## Create the header
http_header = "POST {0} HTTP/1.0\nHost: {1}\nContent-Type: text/xml\nAuthorization: Basic {2}\nContent-Length: {3}\n\n"
req = http_header.format(service, host, auth, len(self.xml_string)) + self.xml_string
Can anyone help me correct the problem so that I can write this information to the database and be able to create the req string to send to the other server?
Am I getting this error as a result of how Django is handling this? If so, what is Django doing? Or, what am I telling Django to do that is causing this?
EDIT1:
I’ve tried to use Django’s django.utils.encoding on this state value as well. I read a little from saltycrane about a possible hiccup Djano might have with unicode/utf-8 stuff.
I tried to modify my logging to use the smart_str functionality.
def createXML(self, dict):
"""
.. method:: createXML()
Create a single-depth XML string based on a set of tuples
:param dict: Set of tuples (simple dictionary)
"""
xml_string = ''
for key in dict:
if (isinstance(dict[key], basestring)):
if (key == 'v1:State'):
var_str = smart_str(dict[key])
for index in range(0, len(var_str)):
var = bin(ord(var_str[index]))
self.logfile.write(var)
self.logfile.write('\n')
self.logfile.write('{0}\n'.format(var_str))
xml_string += '<{0}>{1}</{0}>'.format(key, dict[key])
return xml_string
I’m able to write the correct value to the log doing this but I narrowed down another possible problem with the .format() string functionality in Python. Of course my Google search of python format unicode had the first result as Issue 7300, which states that this is a known “issue” with Python 2.7.
Now, from another stackoverflow post I found a “solution” that does not work in Django with the smart_str functionality (or at least I’ve been unable to get them to work together).
I’m going to continue digging around and see if I can’t find the underlying problem – or at least a work-around.
EDIT2:
I found a work-around by simply concatenating strings rather than using the .format() functionality. I don’t like this “solution” – it’s ugly, but it got the job done.
def createXML(self, dict):
"""
.. method:: createXML()
Create a single-depth XML string based on a set of tuples
:param dict: Set of tuples (simple dictionary)
"""
xml_string = ''
for key in dict:
xml_string += '<{0}>'.format(key)
if (isinstance(dict[key], basestring)):
xml_string += smart_str(dict[key])
else:
xml_string += str(dict[key])
xml_string += '<{0}>'.format(key)
return xml_string
I’m going to leave this question unanswered as I’d love to find a solution that lets me use .format() the way it was intended.
This is correct approach (problem was with opening file. With UTF-8 You MUST use
codecs.open():And this is from python console:
And this is
test.logfile: