I’ve been using htmldoc for a while, but I’ve run into some fairly serious limitations. I need the end solution to work on a Linux box. I’ll be calling this library/utility/application from a Perl app, so any Perl interfaces would be a bonus.
Share
NOTE: This answer is from 2008 and is probably now incorrect; please check the other answers
PrinceXML is the best one I’ve seen (it parses regular HTML as well as XML/XHTML). How is it the best? Well, it passes the acid2 test which I thought was pretty darn impressive
It is however, quite expensive