I want to scrape some html pages that have nested form elements with lxml.

Question

0

Asked: May 24, 20262026-05-24T05:49:59+00:00 2026-05-24T05:49:59+00:00

I want to scrape some html pages that have nested form elements with lxml.

0

I want to scrape some html pages that have nested form elements with lxml. Even BeautifulSoup chokes on these pages, the only parser I’ve found that can handle them so far is MinimalSoup which has no knowledge of which tags can be nested or not.

Does lxml have any parsers that don’t care about about nested form tags?
Any other suggestions?

If I have to I’ll just continue using MinimalSoup.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T05:50:00+00:00

Editorial Team

2026-05-24T05:50:00+00:00Added an answer on May 24, 2026 at 5:50 am

How about lxml.etree.HTMLParser? That should work relatively well, right?

import urllib2
import lxml.etree as etree
page = urllib2.urlopen(url)
parser = etree.HTMLParser()
tree = etree.parse(page,parser)

And you have your tree!

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to scrape some html pages that have nested form elements with lxml.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply