I am using BeautifulSoup and urllib2 for downloading HTML pages and parsing them. Problem

Question

0

Asked: May 11, 20262026-05-11T14:30:12+00:00 2026-05-11T14:30:12+00:00

I am using BeautifulSoup and urllib2 for downloading HTML pages and parsing them. Problem

0

I am using BeautifulSoup and urllib2 for downloading HTML pages and parsing them. Problem is with mis formed HTML pages. Though BeautifulSoup is good at handling mis formed HTML still its not as good as Firefox.

Considering that Firefox or Webkit are more updated and resilient at handling HTML I think its ideal to use them to construct and normalize DOM tree of a page and then manipulate it through Python.

However I cant find any python binding for the same. Can anyone suggest a way ?

I ran into some solutions of running a headless Firefox process and manipulating it through python but is there a more pythonic solution available.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T14:30:13+00:00

2026-05-11T14:30:13+00:00Added an answer on May 11, 2026 at 2:30 pm

Perhaps pywebkitgtk would do what you need.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using BeautifulSoup and urllib2 for downloading HTML pages and parsing them. Problem

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply