In .Net, I found this great library, HtmlAgilityPack that allows you to easily parse non-well-formed HTML using XPath. I’ve used this for a couple years in my .Net sites, but I’ve had to settle for more painful libraries for my Python, Ruby and other projects. Is anyone aware of similar libraries for other languages?
In .Net, I found this great library, HtmlAgilityPack that allows you to easily parse
Share
In python, ElementTidy parses tag soup and produces an element tree, which allows querying using XPath: