I wish to replace all the link urls in some HTML with a number sign (#).
The following basically works, however, it to my dismay inserts a <!DOCTYPE>, <html>, and <body> tag around the modified HTML. Is it possible to keep these tags from being inserted? Is there a better way to do this?
Thank you
$html_with_urls = '<p>hello. Here is a <a href="http://somesite.com">link</a>. Goodby</p>';
libxml_use_internal_errors(true); //Temorarily disable errors resulting from improperly formed HTML
$doc = new DOMDocument();
$doc->loadHTML($html_with_urls);
$a = $doc->getElementsByTagName('a');
foreach ($a as $link) {
if ($link->hasAttribute('href')) {
$link->setAttribute('href', '#');
}
}
$html_without_urls = $doc->saveHTML();
libxml_use_internal_errors(false);
echo($html_with_urls . '<br />' . $html_without_urls);
In my opinion,
DOMDocumentclass has got no option to keep it from adding those extra stuffs. It returns a complete and a valid HTML.For your particular case, you could strip those contents off the document yourself:
[From the code you could tell when it wont work :)]
Or if you could do with some 3rd party libraries, you could use SmartDOMDocumet. Just call the function
saveHTMLExact()instead.