Another utf-8 related problem I believe…
I am using php to update data in a mysql db then display that data elsewhere in the site. Previously I have run into utf-8 problems before where special characters are displayed as question marks when viewed in a browser but this one seems slightly different.
I have a number of records to enter that contain the è character. If I enter this directly in the db then it appears correctly on the page so I take this to mean that utf-8 content is being output correctly.
However when I try and update the values in the db through php, then the è character is replaced. What appears instead is & Atilde ; & uml ; (without the spaces) which appears in the browser as è
I have the tables in the database set to use UTF-8. I believe this is correct cos, as mentioned, if I update the db through phpMyAdmin, its all ok. Similarly I have set the character encoding for the page which seems to be correct. I am also running the sql statement “SET NAMES ‘utf8’;” before trying to update the db.
Anyone have any other ideas as to where the problem may lie?
Many thanks
Yup.
The character you have is LATIN SMALL LETTER E WITH GRAVE. As you can see, in UTF-8 that character is encoded into two bytes
0xC3and0xA8.But in many default, western encodings (such as ISO-8859-1) which are single-byte only, this multi-byte character is decoded as two separate characters, LATIN CAPITAL LETTER A WITH TILDE and DIAERESIS. Notice how they are both encoded as C3 and A8 in ISO-8859-1?
Furthermore, it looks like PHP is processing these characters through htmlentities() which result in the
Ãand¨respectively.So, where exactly is the problem in your code? Well,
htmlentities()could be doing it all by itself since its 3rd argument is a encoding name – which you may not have properly set to'UTF-8'. But it could be some other string processing function as well. (Note: As a general rule, it’s a bad idea to store HTML entities in the database – this step should be reserved for time of display)There are a bunch of other ways to trip yourself up with UTF-8 in php – I suggest hitting up the cheatsheet and make sure you’re in good shape.