I’m setting up a django-admin on top of a legacy MySQL database.
The database declares that it is latin-1 encoded. Some of the entered data in the database is indeed in latin-1 but some is actually UTF-8. This shows up as corrupt characters like: é € ä ö
The legacy application does some black magic to hide these errors and I cannot modify the database.
I found a Python library ftfy that can convert latin-1 corrupted UTF-8 to real UTF-8, for example the above chars get translated to “é € ä ö”. I want to use it on all django.db.models.CharField and django.db.models.TextField data that is loaded from database. How to do it?
I tried to subclass django.db.models.CharField and django.db.models.TextField but couldn’t figure out where to intercept the data from database. Optimal solution would be something like FTFYCharField which would always correct data that it gets from database.
Assuming read-only, I think what you are looking for is Writing custom model fields. In particular, look at the section Converting database values to Python objects. In the
.to_python()method you can do what ever you want to any/all fields read from the DB.If you also need to write (and maintain the weirdness), see the section on Preprocessing values before saving.