I’m importing some arbitrary HTML into a DOMDocument using the loadHTML() function, eg.: $html

Question

0

Asked: May 26, 20262026-05-26T13:02:23+00:00 2026-05-26T13:02:23+00:00

I’m importing some arbitrary HTML into a DOMDocument using the loadHTML() function, eg.: $html

0

I’m importing some arbitrary HTML into a DOMDocument using the loadHTML() function, eg.:

$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument;
$doc->loadHTML($html);

I then want to change a few attributes/node values using DOMDocument methods which I can do no problem.

Once I’ve made these changes I’d like to export the HTML string (using ->saveHTML()), without the <html><body>... tags that the DOMDocument automatically adds to the HTML.

I understand why these are added (to ensure a valid document), but how would I go about just getting my edited HTML back (essentially everything between the <body> tags)?

I have read this post and while it offers some solutions I would rather do this ‘properly’, i.e. without using a string replace on the <body> tags. Validity of the HTML is not an issue as it’s run through an HTML purifier before hand.

Any ideas? Thanks.

EDIT

I’m aware of the $node parameter added to saveHTML() in PHP 5.3.6, unfortunately I’m stuck with 5.2.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T13:02:24+00:00

Perhaps the source code of this will help – They’re using a regex to strip out the unnecessary strings:

http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/

$content = preg_replace(array("/^\<\!DOCTYPE.*?<html><body>/si",
                                  "!</body></html>$!si"),
                            "",
                            $this->saveHTML());

return $content;

saveHTMLExact() – DOMDocument has an extremely badly designed “feature” where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).

Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).

SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.

Also, other questions have asked similar things:

How to saveHTML of DOMDocument without HTML wrapper?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m importing some arbitrary HTML into a DOMDocument using the loadHTML() function, eg.: $html

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply