This is a wierd problem, and I can’t see an easy solution.
If you attempt to use DOM to parse a document that has a </head> tag contained within a javascript function, it doesn’t work correctly. It takes the </head> inside the javascript function as the closing </head> tag.
I have been wrestling with this for hours – any ideas?
<?php
$contents =
<<<EOF
<!DOCTYPE html>
<html><head>
<script>function myFunc() { var myVar = "<head></head>"; } </script>
</head>
<body><p>This is a test</p></body>
</html>
EOF;
//GET CONTENT & LOAD INTO DOM
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadHTML($contents);
//STRIP OUT THE JAVASCRIPT
$scripts = $doc->getElementsByTagName('script');
$length = $scripts->length;
for ($i = 0; $i < $length; $i++) {
$scripts->item(0)->parentNode->removeChild($scripts->item(0));
}
echo htmlentities($doc->saveHTML());
Common Javascript issue: Use this instead: