I’m reading an HTML document that contains UTF-8 chars but when I access the innerHTML of the document, all the “bad” chars show up as 0xfffd. I’ve tried it in all the major browsers and it behaves the same way. When I alert() the innerHTML it shows those chars as a “diamond with a ? mark”.
Surprisingly the following works perfectly, correctly displaying the UTF-8 char in the alert box, so its not alert() is malfunctioning.
alert("Doppelg\u00e4nger!");
Why can’t I access the UTF-8 chars using innerHTML? Or is there another way to access them in JavaScript.
First, check if the document header contains.
You can also read out the meta-tags with javascript:
If it does, this is the explanation of the behavior. You can try changing utf-8 to ISO-8859-1:
Better is to htmlEncode all extended characters in your HTML. Like this:
Mind you, this function will encode everything that is not [a-zA-Z]. This function will encode Doppelgänger in Doppelgänger for example.