I’m parsing an RSS feed that has an ’ in it. SimpleXML turns this into a ’. What can I do to stop this?
Just to answer some of the questions that have come up – I’m pulling an RSS feed using CURL. If I output this directly to the browser, the ’ displays as ’ which is what’s expected. When I create a new SimpleXMLElement using this, (e.g. $xml = new SimpleXmlElement($raw_feed); and dump the $xml variable, every instance of ’ is replaced with ’.
It appears that SimpleXML is having trouble with UTF-8 ampersand encoded characters. (The XML declaration specifies UTF-8.)
I do have control over the feed after CURL has retrieved the feed before it’s used to construct a SimpleXML element.
It came down to having to set the default encoding to UTF-8 in four places:
setlocale(LC_ALL, 'en_US.UTF8');utf8_encode($string);mysqli_set_charset($database_insert_connection, 'utf8');utf8_general_ciIf outputting to the browser, setting the appropriate header (e.g.
header ('Content-type: text/html; charset=utf-8');)Hope this helps someone in the future!