Ok, so I’ve ensured that my MySQL (5.1.61) database is UTF8, the table is UTF8, the field is UTF8, and the MySQL client’s charset is set to UTF8. I can store and retrieve UTF8 entries successfully. I’ve also ensured my terminal’s encoding is set to UTF8.
CREATE TABLE `cities` (
`name` varchar(255) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The issue when it comes to the 200,000 entries that already exist in the database. It appears the people we inherited the project from messed up a lot of the encoding, actually saving a string like Hörby as Hörby where à and ¶ are valid UTF8 characters. That is, MySQL is receiving a UTF8 string of Hörby and is storing it as such. Here is an example where the first entry is one of the old entries, and the second is us inserting “Hörby” into the database with everything set to UTF8:
mysql> INSERT INTO cities SET name = 'Hörby';
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM cities;
+----------+
| name |
+----------+
| Hörby | <--- old entry
| Hörby | <--- new entry
+----------+
What can we do to convert the squished characters into what they once were? We’re pretty much ready to do anything at this point, but re-typing all 200,000 records is not feasible.
It looks like you had previously stored
utf8encoded strings in alatin1column, then converted that column toutf8. To fix that:Convert the data back to
latin1:Change the column type to UTF-8 without altering the data (going via
binary):