I am using DOMDocument to load some user contributed HTML blocks and then manipulate them. It appears (assuming I am doing everything correctly) that DOMDocument is running the urls inside an href attribute through htmlentities. This is making my anchor tags which have ampersands in the query string come out incorrect.
Example:
$html = <<<HTML
<a href="http://foo.com?bar=baz&foo=bar">Foo</a>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
echo $dom->saveHTML();
The output becomes(notice the & in the url was converted to &):
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><a href="http://foo.com?bar=baz&foo=bar">Foo</a></body></html>
Additionally, during the call to $dom->loadHTML($html); the following warnings were output…
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ‘;’ in
Entity, line: 1
I have no idea what that means.
Am I missing something?
The ampersand symbol is used in valid/compliant XHTML to determine HTML entity characters.
See this reference list:
http://www.w3schools.com/tags/ref_entities.asp
Your DOMDocument is complaining as it has detected an invalid character definition on the way in, and corrected it on the way out.