I’m working on a PHP parser that parses my school’s HTML ‘groups’ page. These are pages with a unique URL based on the name of the course and several other variables. The page consists of a bunch of HTML <table>‘s.
Loading the HTML from the url works fine up until it comes across a ) in the file’s content. Then it just stops loading and only stores what it’s gotten so far. Obviously, the HTML loaded was not created by me and there is no way i can prevent such characters from being in the HTML code.
It however works fine when i run it locally using MAMP. I tried looking for answers, but haven’t found anything that solved my problem.
How can i escape these characters before loading it?
My current PHP:
$dom = new DOMDocument;
libxml_use_internal_errors(true); // the HTML i parse contains a lot of unclosed tags, this to prevent the errors from displaying on the page
$dom->loadHTMLFile('http://isarog.hhs.nl/Web_Site/HHS/ICTM/Public/Iris_Roster/Timetables/11_2/11_2-CMD-4vt-p2.html');
echo $dom->getElementsByTagName('html')->item(0)->nodeValue;
This question solved my problem: Remove control characters from php String
Apparently there was an invisible character in my HTML input that was causing the load function to stop reading. The following cleared it all up: