A site I’m working on recently had an issue with the database, apparently it got corrupted when they restored the tables any text field with strange symbols (eg half symbol and degree symbol) the text field stopped at the character before that symbol). I’ve got a copy of the table and distilled it down to the code below:
CREATE TABLE `products2` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
insert into products2 values
(25, 0x5468652044504D203931322069732061206C617267652033BD204469676974204C434420566F6C746D657465722E20546865207369676E616C206265696E67206D6561737572656420697320616C736F207573656420746F20706F77657220746865206D657465722C20696E636C7564696E6720746865206261636B6C696768742E20546865206D657465722066656174757265732061203320746F20363056206D6561737572656D656E742072616E67652C20776974682061207265736F6C7574696F6E206F662031306D56206265747765656E20332E303020616E642031392E39395620616E64203130306D56206265747765656E2032302E3020616E642036302E30562E205768656E2074686520766F6C746167652064726F70732062656C6F772033562C204C4F20697320646973706C617965642028646F776E20746F20322E38562C207768656E2074686520646973706C61792077696C6C207475726E206F6666292E209148499220697320646973706C61796564207768656E2074686520766F6C7461676520676F65732061626F7665203630562E0D0A0D0A5363726577207465726D696E616C7320616C6C6F7720666F7220717569636B20616E64206561737920636F6E6E656374696F6E2E20546865206D6574657220697320686F7573656420696E206120726F6275737420636172726965722077686963682063616E20626520626F6C74656420696E20706C616365206F722070616E656C206D6F756E746564207573696E6720746865206C6F772070726F6669206C652062657A656C20616E6420636C6970732070726F76696465642E20416E2049503637202F204E454D412034582062657A656C20697320616C736F20617661696C61626C6520666F722070726F74656374696F6E20616761696E7374206475737420616E64206D6F6973747572652E0D0A0D0A417320746869732069732061206E65772064657369676E2077652073756767657374207468617420796F7520636F6E74616374204C617363617220666F7220757020746F2064617465206C6561642D74696D6520696E666F726D6174696F6E206265666F7265206F72646572696E67206F6E6C696E652E0D0A)
This throws an error:
#1366 - Incorrect string value: '\xBD Digi...' for column 'description' at row 1
Looking into this problem on stackoverflow and around the web it seems to be an issue with the encoding, I’ve tried changing the collation to utf_unicode_ci on the description field and the collation of the table to utf_bin (and all combinations of those) all to no avail.
I can’t redo the dump as it’s a backup. I don’t understand how the system can output the dump but not accept it back – presumably the backup is via the command line (not certain) and I am using PHPMyAdmin to restore it I don’t know if that makes a difference.
If it’s not possible to import the data I’d be grateful if someone could tell me how to read the encoded data into text that I can then manually cut and paste.
Decoding the first 32 bytes as ASCII, we have (where
?is the0xBDbyte about which MySQL is complaining):A little bit of Googling for “DPM 912” suggests to me that character should be the vulgar one-half fraction, ½.
A number of character sets encode that character with the byte
0xBD, but one in particular jumps out:windows-1252—which was not only the default codepage in the (pre-Unicode) Windows world, but is also MySQL’s default encoding. It’d be a good guess that your data is encoded inwindows-1252.As explained in the MySQL manual, you can specify the encoding of a string literal by prefixing it with the encoding name:
It goes on to say:
Therefore (and because MySQL refers to
windows-1252aslatin1), you could change yourINSERTcommand to:The documentation also states:
That is, if such an introducer is omitted (as it was in your original
INSERTstatement), the character set is assumed to be that defined by thecharacter_set_connectionsystem variable.As mentioned here, there are number of ways of setting that variable (including by specifying it when your client connects which, in phpMyAdmin, is set with the
[DefaultCharset]configuration option, of which the default waslatin1prior to v3.4, but has beenutf8since – perhaps this change is the origin of your problems; one can also specify the character set of import files with[Import][charset]). If one doesn’t specify the desired character set upon connecting, issuing any of these commands after connecting but before yourINSERTcommand will fix it (you could, for example, add one of them to the top of your dump file):My recommendation, which makes the dumpfile as portable as possible, would be to add
SET NAMES 'latin1'to the top of it.