I’m using PHP’s DOMDocument to parse and normalize user-submitted HTML using the loadHTML method to parse the content then getting a well-formed result via saveHTML:
$dom= new DOMDocument();
$dom->loadHTML('<div><p>Hello World');
$well_formed= $dom->saveHTML();
echo($well_formed);
This does a beautiful job of parsing the fragment and adding the appropriate closing tags. The problem is that I’m also getting a bunch of tags I don’t want such as <!DOCTYPE>, <html>, <head> and <body>. I understand that every well-formed HTML document needs these tags, but the HTML fragment I’m normalizing is going to be inserted into an existing valid document.
IN your case, you do not want to work with an HTML document, but with an HTML fragment — a portion of HTML code ;; which means DOMDocument is not quite what you need.
Instead, I would rather use something like HTMLPurifier (quoting) :
And, if you try your portion of code :
Using the demo page of HTMLPurifier, you get this clean HTML as an output :
Much better, isn’t it ? 😉
(Note that HTMLPurfier suppots a wide range of options, and that taking a look at its documentation might not hurt)