I’m using BeautifulSoup to parse through data on baseball-reference.com and it works find for

Question

0

Asked: June 9, 20262026-06-09T02:05:57+00:00 2026-06-09T02:05:57+00:00

I’m using BeautifulSoup to parse through data on baseball-reference.com and it works find for

0

I’m using BeautifulSoup to parse through data on baseball-reference.com and it works find for every page, except for a few like this one Same pages (different data) work perfectly, ie this one.
I’m trying to filter out tables with ‘stats_table’ as one of the classes. I use this code:

bs = BeautifulSoup(stream, 'lxml', parse_only=SoupStrainer('table'))

and then I do sth like:

for table in bs.find_all('table'):
     print table.attrs
       ... bla bla...

It is obvious out of table.attrs that this code doesn’t see batting and pitching tables and that they are there… I repeat: the same code works fine for almost all other pages like this.
Looking over str(bs) clearly shows that

ANY ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T02:05:58+00:00

Editorial Team

2026-06-09T02:05:58+00:00Added an answer on June 9, 2026 at 2:05 am

As you posted in the comments there are errors on the page. You should use HTML Tidy to clean it up : http://pypi.python.org/pypi/pytidylib/0.2.1

You can check HTML Tidy at work: http://validator.w3.org/

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using BeautifulSoup to parse through data on baseball-reference.com and it works find for

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply