I am in the process of amending my site to use UTF-8 encoding, this is proving to be a bit of a pain. This is what I have done so far:
1) <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
2) mysql_set_charset("utf8");
3) Set the character set of the database, table and column to utf8
Now I have some data in a field, the content of which is (the exact form):
<p>Tom & Jerry £5</p>
Upon outputting this data to the page, everything appears correct visually. However upon doing a HTML validation check, it fails on the & symbol, as it is not being changed to &.
Now I tried to fix this by using htmlentities() but this encodes everything, including the angle brackets for the HTML Ptag. So it does not appear correctly. Furthermore the pound symbol appears with a weird character in front of it.
My question is, am I storing the data correctly in the database table? And if so, what do I need to do to ensure everything displays AND validates correct?
Use htmlspecialchars instead of htmlentities when outputting UTF-8 text in a web page.