I was wondering, what is the best practice. To convert all utf-8 special characters into HTML entities or only to escape &, < and >.
I’m working on several PHP projects. And google is displaying some wrong utf-8 results for a random part of my website.
I think this is because of one or both of the two following reasons:
- My hosting provider didn’t automatically send the encoding headers (I already fixed this).
- Or the fact that in the description the text was not fully escaped.
Besides that, I noticed that most of the mayor company websites don’t send the '<?xml version' line and they don’t escape their characters.
Are there downsides (or upsides) to escaping all characters vs only the minimum necessary?
Converting any characters beyond
<>&"'(as done byhtmlspecialchars()) is not necessary nowadays. If the page’s character set is properly configured, it is no problem to use native UTF-8 characters (or whichever character set you choose). Converting them into entities has no advantage. They are sometimes used as a misguided workaround to character set issues, but this is almost never a good idea.