I am loading a bunch of rss feeds using DOM and sometimes one will 404 instead of producing the file. The problem is that the web-server sends out an html 404 page in place of the expected xml file so using this code:
$rssDom = new DOMDocument();
$rssDom->load($url);
$channel = $rssDom->getElementsByTagName('channel');
$channel = $channel->item(0);
$items = $channel->getElementsByTagName('item');
I get this warning:
Warning: DOMDocument::load() [domdocument.load]: Entity 'nbsp' not defined
Followed by this error:
Fatal error: Call to a member function getElementsByTagName() on a non-object
Normally, this code works fine, but on the occasion that I get a 404 it fails to do anything. I tried a standard try-catch around the load statement but it doesn’t seem to catch it.
You can suppress the output of parsing errors with
To check whether the returned response is a 404 you can check the
$http_response_headerafter the call toDOMDocument::load()Example:
The alternative would be to use
file_get_contentsand then check the response header and if its not a 404 load the markup withDOMDocument::loadXml. This would preventDOMDocumentfrom parsing invalid XML.Note that all this assumes that the server correctly returns a 404 header in the response.