Well, apparently, PHP and it’s standard libraries have some problems, and DOMDocument isn’t an exception.
There are workarounds for utf8 characters when loading HTML string – $dom->loadHTML().
Apparently, I haven’t found a way to do this when loading HTML from file – $dom->loadHTMLFile(). While it reads and sets the encoding from <meta /> tags, the problem strikes back if I haven’t defined those. For instance, when loading a fragment of HTML (template part, like, footer.html), not a fully built HTML document.
So, how do I preserve utf8 characters, when loading HTML from file, that hasn’t got it’s <meta /> keys present, and defining those is not an option?
Update
footer.html (the file is encoded in UTF-8 without BOM):
<div id="footer">
<p>My sūpēr ōzōm ūtf8 štrīņģ</p>
</div>
index.php:
$dom = new DOMDocument;
$dom->loadHTMLFile('footer.html');
echo $dom->saveHTML(); // results in all familiar effed' up characters
Thanks in advance!
While I’m not sure about how to go about solving the problem with
->loadHTMLFile(), have you considered usingfile_get_contents()to get the HTML, runmb_convert_encoding()on that string, then pass that value in to->loadHTML()?Edit: Also, when you initialize DOMDocument, are you giving it the $encoding argument?