You probably want to call setlocale() first, "LC_ALL" should do…

Question

0

Asked: May 13, 20262026-05-13T06:13:19+00:00 2026-05-13T06:13:19+00:00

From what I can make out, the two main HTML parsing libraries in Python

0

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I’ve chosen BeautifulSoup for a project I’m working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I’ve heard that lxml is faster.

So I’m wondering what are the advantages of one over the other? When would I want to use lxml and when would I be better off using BeautifulSoup? Are there any other libraries worth considering?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T06:13:20+00:00

~~For starters, BeautifulSoup is no longer actively maintained, and the author even recommends alternatives such as lxml.~~

Quoting from the linked page:

Version 3.1.0 of Beautiful Soup does
significantly worse on real-world HTML
than version 3.0.8 does. The most
common problems are handling
tags incorrectly, “malformed start
tag” errors, and “bad end tag” errors.
This page explains what happened, how
the problem will be addressed, and
what you can do right now.

This page was originally written in
March 2009. Since then, the 3.2 series
has been released, replacing the 3.1
series, and development of the 4.x
series has gotten underway. This page
will remain up for historical
purposes.

tl;dr

Use 3.2.0 instead.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions