I have an XML file which I’m parsing with SimpleXML in php. The first line is
<?xml version="1.0" encoding="iso-8859-1"?>
The result of the parse is stored in $xml, if I do:
echo $xml->asXML();
then the entire file displays perfectly.
But if I dig into the structure in anyway, I get Â’s everwhere, eg:
echo $xml->Chapter->asXML();
Inside some of the XML elements there is MathML (<math>), this is where the Â’s occur.
For example the character ∈ is replaced by a Â.
How can I parse the XML file but not lose the MathML characters?
∈ is not a character that can be represented in ISO 8859-1, change your XML to say that it is encoded with UTF-8.
To give an example demonstrating the problem.
Outputs (as UTF-8) the following.
SimpleXML will try to convert to UTF-8 when the
encodingis set to something different. It is always a good idea not to give it that work to do when the input is already UTF-8 encoded and theencodingdeclaration is incorrect.Also be sure that PHP itself is outputting UTF-8, and telling the browser that this is the case!
You can do this by setting the
default_charsetINI option (in your php.ini or withini_set()), or sending the correctContent-Typeheader (header('Content-Type: text/html; charset=utf-8')).