I am querying a server for some information (returned as JSON), inter alia a list of names, and one of those names returns containing weird characters:
Ðемања Матејић
This is how it should be:
Немања Матејић
I have tried the following:
- Remove the BOM (byte-order mark) from the string (or else PHP won’t decode the JSON), then decode it using
json_decodeand directly take the name and insert it into my UTF8-encoded MySQL database. - Using a field with a UTF8 collation.
… to no avail – the value in the database still remains flawed.
How to solve this?
Edit:
Running SHOW VARIABLES LIKE '%character%' returns
character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /data/mysql/fuentez/share/mysql/charsets/
Is it possibly because character_set_server is latin1?
You stored the data in the database as latin1 instead of UTF-8.
For example the string
еencoded as latin1 becomes0xd0 0xb5which is the UTF-8 encoding of the Cyrillic letterе.