Recently read this SO Post …first answer is nutz. Basically it is theoretically impossible for large models because of Chomsky Grammars Types.
What it the alternative? I don’t want to use a library object like DOMDocument, I want to understand what is the correct way to do this with pure code?
If you don’t want to use DOMDocument (though I’d urge you to look into it again, it’s not that bad – especially combined with DOMXPath), you can also use PHPQuery or Simple HTML DOM Parser.