I’m having an issue that i believe is related to unicode text. When the user enters a string that has the unicode bullet character, mysql is not able to save that field (the rest of the update query works though). Here’s how i’ve been trying to deal with it.
$str = "· Close up the server";
$str = preg_replace("\u2022", "•", $str);
…however this is still not working.
So many things can go wrong here, because database, form submits and source code string literals are all involved. I’ll assume you want to use UTF-8, because with any other typical encoding (CP1252, Latin1) you’ll be screwed when you want to use
json_or accept more than ~200 different characters.The first thing to do is remove any kind of conversion etc code that was written with the intention of trying to fix encoding issues. Such as
utf8_encode,htmlentitites,*_replace.. whatever.Source encoding.
When writing the above, the PHP source file needs to be physically encoded in UTF-8. If you are on Windows, you must explicitly do or configure this. UTF-8 doesn’t happen magically on Windows.
Form submits
When user submits a form, the payload will be in whatever encoding you declared the page to be. You can declare it like so:
But anyone can actually submit arbitrary bytes to your server, so you should validate the input is in UTF-8 before proceeding.
mb_check_encodingis good.Database
Since at this point your data is coming in as UTF-8, your input strings are in UTF-8. You must specify this after connecting to the database, by specifying a connection encoding.
This makes the database read your input in UTF-8, and encode its output in UTF-8. You would also want to set your columns/tables/databases to UTF-8 as well.
Unicode escape sequences
\uxxxxor\uhhhh\ullllor\Uxxxxxxxxare not supported in PHP.