I have and input XML file that is not correctly formatted ( ie. it has ‘&’ instead of ‘& amp;’)
When i try to load this XML using PHP DOM, $doc->load(“file.xml”) it throws and error and stops the parsing.
Is there any way to load this un-formatted XML? and No I cant edit the source XML file.
I did try using $doc->loadHTML() but it throws errors all over the place.
I wanted to know if there is a proper way to do this (like load file contents and change it using regex or something similar)
First, check that it’s the
&that’s causing the error and not something else.One way or another, you’ll have to modify the XML to get it parsed. The HTML in
loadHTMLis loaded from a string, can’t you just replace the invalid characters with the correct ones?If your installation supports the PHP Tidy extension (http://php.net/manual/en/book.tidy.php) you could try to clean it up with that, though in my experience it’s far from foolproof.