Sometimes, I want to parse HTML to extract URLs. I find [html.parser.HTMLParser] and [re.match]

Question

0

Asked: May 26, 20262026-05-26T07:00:20+00:00 2026-05-26T07:00:20+00:00

Sometimes, I want to parse HTML to extract URLs. I find [html.parser.HTMLParser] and [re.match]

0

Sometimes, I want to parse HTML to extract URLs.
I find [html.parser.HTMLParser] and [re.match] both can do the job.
I want to know which is faster.

Is there a python-module like jquery to parse HTML?

If you have better solution, please leave a comment.

Thanks

lxml is very good.
it make the job really simple.

>>>for url in parse(urlopen('http://www.stackoverflow.com')).getroot().find_class('question-hyperlink'): print(url.get('href'))

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T07:00:21+00:00

Editorial Team

2026-05-26T07:00:21+00:00Added an answer on May 26, 2026 at 7:00 am

I would strongly suggest lxml. In my experience, it is the fastest. lxml will actually generate a tree in memory. So you can parse/serialize/…
On the other hand, if you have to pick among the mentioned two options, I’d suggest you use the timeit module and determine it.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Sometimes, I want to parse HTML to extract URLs. I find [html.parser.HTMLParser] and [re.match]

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply