I have some random HTML and I used BeautifulSoup to parse it, but in

Question

0

Asked: May 24, 20262026-05-24T06:53:59+00:00 2026-05-24T06:53:59+00:00

I have some random HTML and I used BeautifulSoup to parse it, but in

0

I have some random HTML and I used BeautifulSoup to parse it, but in most of the cases (>70%) it chokes. I tried using Beautiful soup 3.0.8 and 3.2.0 (there were some problems with 3.1.0 upwards), but the results are almost same.

I can recall several HTML parser options available in Python from the top of my head:

BeautifulSoup
lxml
pyquery

I intend to test all of these, but I wanted to know which one in your tests come as most forgiving and can even try to parse bad HTML.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T06:54:00+00:00

Editorial Team

2026-05-24T06:54:00+00:00Added an answer on May 24, 2026 at 6:54 am

I ended up using BeautifulSoup 4.0 with html5lib for parsing and is much more forgiving, with some modifications to my code it’s now working considerabily well, thanks all for suggestions.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have some random HTML and I used BeautifulSoup to parse it, but in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply