Today I received data via the Django admin which couldn’t be encoded. Somehow the encoding of the data is not in unicode. How is this possible?
I have a name property at my Client model which returns the data in unicode:
@property
def name(self):
return u'{0} {1}'.format(self.firstname, self.lastname).strip()
But this doesnt work:
>>> client
<Client: [Bad Unicode data]>
>>> client.lastname
'Dani\xc3\xabl'
>>> client.lastname.__class__
<type 'str'>
>>> u"{0} {1}".format(client.firstname, client.lastname)
Traceback (most recent call last):
File "<console>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
Stange enough, encoding the first/lastname as regular string does work:
>>> "{0} {1}".format(client.firstname, client.lastname)
'Test Dani\xc3\xabl'
>>> "{0} {1}".format(client.firstname, client.lastname).decode('utf-8')
u'Test Dani\xebl'
What happened here? and how did this input get into my model via the admin?
System stack (it’s an external server):
- Debian 6.0.5 (Squeeze)
- Django 1.4.1
- Python 2.6.6
- MySQL 5.1.49
- MySQL-python==1.2.2
This is the relevant model code:
class Client(models.Model):
firstname = models.CharField(_("Firstname"), max_length=255)
lastname = models.CharField(_("Lastname"), max_length=255)
email = models.EmailField(_("Email"), unique=True, max_length=255)
class Meta:
db_table = u'clients'
ordering = ('firstname', 'lastname', 'email')
def __unicode__(self):
return u'{0} <{1}>'.format(self.name, self.email)
@property
def name(self):
return u'{0} {1}'.format(self.firstname, self.lastname).strip()
This is probably due to the collation you are using for your MySQL database.
Indeed, Django’s behavior is to always return
unicodestrings when retrieving data form the database – which would work with your code, as there is nothing wrong with it.However, as you can see in the django documentation on database settings, section collation settings, using MySQLdb version 1.2.2 with an
utf8_bincollated MySQL database will cause you to not to get unicode strings, but bytestrings, when retrieving charfields form the database.You might want to investigate this issue (that is, check your MySQL collation settings), but it is likely that your problem is coming from there.
If this is the case, you will have to decode by hand any input that you are getting from MySQL. Alternatively, you could change the collation settings of your database.
You can use
SHOW TABLE STATUS FROM %YOURDB%to get the collation of the tables in your database.Excerpt from the relevant documentation section: