I’m building an XML file from scratch and need to know if htmlentities() converts every character that could potentially break an XML file (and possibly UTF-8 data)?
The values will be from a twitter/flickr feed, so I need to be sure-
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
htmlentities()is not a guaranteed way to build legal XML.Use
htmlspecialchars()instead ofhtmlentities()if this is all you are worried about. If you have encoding mismatches between the representation of your data and the encoding of your XML document,htmlentities()may serve to work around/cover them up (it will bloat your XML size in doing so). I believe it’s better to get your encodings consistent and just usehtmlspecialchars().Also, be aware that if you pump the return value of
htmlspecialchars()inside XML attributes delimited with single quotes, you will need to pass theENT_QUOTESflag as well so that any single quotes in your source string are properly encoded as well. I suggest doing this anyway, as it makes your code immune to bugs resulting from someone using single quotes for XML attributes in the future.Edit: To clarify:
htmlentities()will convert a number of non-ANSI characters (I assume this is what you mean by UTF-8 data) to entities (which are represented with just ANSI characters). However, it cannot do so for any characters which do not have a corresponding entity, and so cannot guarantee that its return value consists only of ANSI characters. That’s why I ‘m suggesting to not use it.If encoding is a possible issue, handle it explicitly (e.g. with
iconv()).Edit 2: Improved answer taking into account Josh Davis’s comment belowis .