I have a HTML file and when I view it in notepad, I can see the following:
<p><span>Copyright © 2008 Your Company Name</span>
Notice the copyright symbol:
I load the HTML and perform this on it:
$html = file_get_contents('test.html');
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
file_put_contents('output.html', $html);
When I view the html again in notepad, the copyright symbol has disappeared and is replaced by a space?!
I want the copyright symbol to be replaced by a © or ©. Is this not what mb_convert_encoding with the HTML-ENTITIES option does?
This is the test HTML file I am using.
Your test HTML page is not encoded in UTF-8; therefore, when
mb_convert_encodingsees the copyright character (ordinal value 169) it doesn’t know what to do with what it perceives as an invalid UTF-8 sequence.You should therefore specify the correct input encoding when calling
mb_convert_encoding:Alternatively, you can use something like
Note: I am answering your question directly, but you don’t say what you need the conversion for. It’s possible that there may be a better way to achieve your goal.