I am trying to convert all instances to regular spaces in PHP:
echo '<meta charset="UTF-8" /> ';
echo html_entity_decode(' ');
echo html_entity_decode(' ', ENT_COMPAT, 'UTF-8');
If the first line is commented out, then the output will be in ISO 8859-1 and read:
Â
Where there is a space in front. If UTF-8 encoding is specified, it reads:
�
Which is an undefined UTF-8 character followed by a space. Is there anyway to ensure that all HTML entity spaces are correctly decoded regardless of the encoding?
The space character is really just an example, what I am trying to do is read html input from an unspecified charset and display it. So < and < would both become <.
This is problem with encodings. They are not compatible. You have to use different options in
html_entity_decodefor every encoding. However, You may convert input to utf-8 (iconv) first and usehtml_entity_decode($string, ENT_COMPAT, 'UTF-8')later.If You don’t know the encoding of input, You have to guess.