I am using HTML Purifier in my PHP project and am having trouble getting it to work properly with user input.
I am having users enter in HTML using a WYSIWYG editor (TinyMCE), but whenever a user enters in the HTML entity (non-breaking space) it gets saved into the database as this weird foreign character (Â).
However, the thing is, when I edit the saved entry using the WYSIWYG editor it gets displayed properly as . It also functions properly when displayed, only that in the source code it appears as a real space, but not the non-breaking space character.
Also, in the MySQL database it displays as the weird foreign character.
I read the doc about Unicode and HTML Purifier and changed my database and web page encoding to be UTF-8, but I am still having problems with the non-breaking space character not being mangled. The other HTML entities, such as < and >, get saved as < and >, but why not ?
The non-breaking space isn’t being saved in your database as one weird foreign character, it’s being saved as two characters. The Unicode non-breaking space character is encoded in UTF-8 as
0xC2 0xA0, which in ISO-8859-1 looks like ‘Â ‘ (i.e. a weird foreign character followed by a non-breaking space).You’re probably forgetting to do
SET NAMES 'utf8'on your database connection, which causes PHP to send its data to MySQL as ISO-8859-1 (the default).Have a look at ‘UTF-8 all the way through…‘ to see how to properly set up UTF-8 when using PHP and MySQL.