I have to redesign a class where (amongst other things) UTF-8 strings are double-encoded wrongly:
$string = iconv('ISO-8859-1', 'UTF-8', $string);
:
$string = utf8_encode($string);
These faulty strings have been saved into multiple table fields all over a MySQL database. All fields being affected use collation utf8_general_ci.
Usually I’d setup a little PHP patch script, looping thru the affected tables, SELECTing the records, correct the faulty records by using utf8_decode() on the double-encoded fields and UPDATE them.
As I got many and huge tables this time, and the error only affects german umlauts (äöüßÄÖÜ), I’m wondering if there’s a solution smarter/faster than that.
Are pure MySQL solutions like the following safe and recommendable?
UPDATE `table` SET `col` = REPLACE(`col`, 'ä', 'ä');
Any other solutions/best practices?
Alter the table to change the column character set to Latin-1. You will now have singly-encoded UTF-8 strings, but sitting in a field whose collation is supposed to be Latin-1.
What you do then is, change the column character set back to UTF-8 via the binary character set – that way MySQL doesn’t convert the characters at any point.
(is the correct syntax iirc; put the appropriate column type in where
...is)