I’m trying to get a value with a € sing out of xml but when I try it gives back weird code.
$xmlDate = $searchNode->getElementsByTagName( "kostenvoorverkoop" );
$valueKostenvoorverkoop = htmlentities($xmlDate->item(0)->nodeValue,ENT_QUOTES,"UTF-8");
//gives back Á€10,- instead of €10,-
can’t find the problem.
//XML
<?xml version="1.0" encoding="ISO-8859-1" ?>
<price>€10</price>
If I leave the htmlentities it gives a completely wierde string like ÁáÙ%10 <—- not exactly this but you know what I mean.
if anyone can help me with this it would help me greatly, thanks in advance.
edit:
found a small work around: change the € for &euro;. know not clean but works.
The character
€does not exist in ISO-8859-1, so this XML declaration can’t possibly be right.The output
Á€suggests the file has actually been encoded in Windows code page 1252 (Western European), which is similar to ISO-8859-1 but has different characters in the range 0x80–0x9F, include the euro sign.PHP has parsed the data as ISO-8859-1, where the CP1252 encoding of
€, byte 0x80, maps to the control character U+0080. It then gives you the Unicode string containing U+0080 as a UTF-8-encoded byte string, U+00C2,U+0080. Outputting that to a browser in a page served as cp1252, ISO-8859-1 (for tedious confusing legacy reasons) or without a charset on a Western European machine, givesÁ€.htmlentities()doesn’t encode this in any way because there’s no HTML entity for the control code U+0080.Here’s how you should proceed:
If you must have your XML input file in cp1252, state that in the XML declaration’s
encoding="windows-1252"rather than the inaccurateISO-8859-1. XML parsers aren’t required to be able to read cp1252, though, so better for interoperability would be to just use the default UTF-8 encoding and re-save the file to match.Serve your output HTML page as UTF-8, using a
Content-Typeheader or meta tag. Then usehtmlspecialchars()instead ofhtmlentities()so you don’t waste time encoding non-ASCII characters that don’t need it.