I am trying to get the image tag out of html codes.
I have
$parser=new DOMDocument;
$parser->loadHTML($this->html);
foreach($parser->getElementsByTagName('img') as $imgNode){
echo $parser->saveHTML($imgNode);
}
$this->html contains massive html code and javascripts.
for example:
<div id='someid'>
<button id='bt' onclick='clickme()'>click me</button>
<img src='test.jpg'/>
.....
.....
more...
</div>
<div>
.....
.....
more...
I got an warning saying
DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
I am not sure how to fix this and don’t know if there are a better way to extract all the images from the massive html codes.
Any ideas?
Thanks a lot!
I am in no way an expert on these matters (yet), but I hope this helps in some way.
According to this answer by troelskn you can make the DOM parser more tolerant to badly formed HTML by using
libxml_use_internal_errors. That might help you getting rid of that error.Parsing all images of a document can be done by using
DOMXPath. It takes aDOMDocumentas a parameter and lets you run XPath queries on the document.DOMXPath::queryreturns aDOMNodeListwhich can be looped through usingDOMNodeList::item, which returns aDOMNode.Disclaimer: The code I posted is untested and was put together using the manual.