We have a MySQL table with fields using latin1_swedish_ci however there is some UTF-8 encoded data in there.
When I change the collation from latin1_swedish_ci to utf8_general_ci it converts the ISO-8859-1 stuff fine, but the UTF-8 data gets broken, seeing things like… you’re.
Any ideas?
Edit: we are using InnoDB with MySQL 5.1
So, your “latin1” column had some data encoded in UTF-8, but not everything, and now you have some data doubly-encoded in UTF-8?
You could do this to fix the doubly encoded values: first convert back to latin1, re-interprete the string as binary, and tell MySQL that the binary string is actually text encoded in UTF-8. In MySQL this can be written as:
Now the problem is detecting which strings are doubly encoded, so that you can update only those. This can be done by comparing the length of the original string in characters with the length of the new string in bytes; for doubly-encoded text they should be equal. This is how you end up with: