Here’s a sample string, stored in a MySQL database, running on a Linux server: ™
That’s the single TM character, which is represented as 0x2122 in UTF-16BE, or 0xE284A2 in UTF-8
The database table is encoded in utf8-unicode-ci. I’m running PHP on another Linux server, which uses an internal encoded (as reported by mb_internal_encoding) of ISO-8859-1, which uses the same encoding for the character as UTF-8.
When I run a SQL query to get the string, it returns 0x0099, which is its representation in Windows-1252.
How would that even happen, and how can I fix it to return in a more sensible codepage?
The behavior you observe is due to the default MySQL client characterset.
You can override the default, and specify the characterset that is to be used for the client connection. If you are using mysqli, then do this:
see:
http://php.net/manual/en/mysqlinfo.concepts.charset.php
http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html