$msg = "<body><a>áéíóú☻♥♦♣</a></body>";
$temp_dom = new DOMDocument();
$temp_dom->loadHTML($msg);
$dom_xpath = new DOMXpath($temp_dom);
$ele = $dom_xpath->query('//a')->item(0);
echo "<pre>";
echo "Original: $msg\n";
echo $ele->nodeValue;
echo "</pre>";
Output:
Original: áéíóú☻♥♦♣
áéÃóúâ»â¥â¦â£
The current document encoding is utf-8.
I tried ANSI too and same problem happened.
utf8_decode solves the problem
echo utf8_decode($ele->nodeValue);
But the thing is, I use a lot of attributes and a lot of functions that I would have to use utf8_decode in each one of them, and I believe that’s not the correct thing to do.
Someone know how could I do this?
Please use this test and test it before posting a result, because I’ve already tried a lot of things.
Thank you very much in advance.
The problem is that you need to tell DOMDocument what the encoding is as the HTML is parsed. You can’t do this by setting the
encodingoption. (I believe that affects how the document is output withsaveHTML.)The slightly hackish way to do this is to insert a statement of the encoding into the document. You can do this simply by inserting
'<?xml encoding="UTF-8">'before the HTML you are parsing.Output:
Note, however, that this does insert an extra node as a child of the document object (a
DOMProcessingInstructionto be precise), so be aware of this if you are doing anything with$temp_dom->childNodesor suchlike.