I want to scrape some html pages that have nested form elements with lxml. Even BeautifulSoup chokes on these pages, the only parser I’ve found that can handle them so far is MinimalSoup which has no knowledge of which tags can be nested or not.
Does lxml have any parsers that don’t care about about nested form tags?
Any other suggestions?
If I have to I’ll just continue using MinimalSoup.
How about lxml.etree.HTMLParser? That should work relatively well, right?
And you have your tree!