I tried several methods to find out what part of a html string is invalid
$dom->loadHTML($badHtml);
$tidy->cleanRepair();
simplexml_load_string($badHtml);
None is clear regarding what part of the html is invalid. Maybe and extra config option for one of the can fix that. Any ideas ?
I need this to manually fix html input from users. I don’t want to relay on automated processes.
I’d try loading the offending HTML into a DOM Document (as you are already doing) and then using simplexml to fix things. You should be able to run a quick diff to see where the errors are.