I wrote a python script that processes a large amount of downloaded webpages HTML(120K

Question

0

Asked: May 31, 20262026-05-31T09:30:23+00:00 2026-05-31T09:30:23+00:00

I wrote a python script that processes a large amount of downloaded webpages HTML(120K

0

I wrote a python script that processes a large amount of downloaded webpages HTML(120K pages). I need to parse them and extract some information from there. I tried using BeautifulSoup, which is easy and intuitive, but it seems to run super slowly. As this is something that will have to run routinely on a weak machine (on amazon) speed is important. is there an HTML/XML parser in python that will work much faster than BeautifulSoup? or must I resort to regex parsing..

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T09:30:25+00:00

Editorial Team

2026-05-31T09:30:25+00:00Added an answer on May 31, 2026 at 9:30 am

lxml is a fast xml and html parser: http://lxml.de/parsing.html

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wrote a python script that processes a large amount of downloaded webpages HTML(120K

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply