I’m having some difficulties with Beautifulsoup. You can find the html here -> http://pastebin.com/Nr1k0dcM

Question

0

Editorial Team

Asked: June 13, 20262026-06-13T09:49:27+00:00 2026-06-13T09:49:27+00:00

I’m having some difficulties with Beautifulsoup. You can find the html here -> http://pastebin.com/Nr1k0dcM

0

I’m having some difficulties with Beautifulsoup.

You can find the html here -> http://pastebin.com/Nr1k0dcM

after that I simply run a soup = BeautifulSoup(html) print soup.prettify()

There shouldn’t be any difference in the result from the html but I only get this > http://pastebin.com/Y6DmEj40

I really don’t understant what’s going on here…

EDIT:

This is one of the url I’m scrapping for example: http://fantasy.premierleague.com/entry/38861/event-history/8/

I’m only scrapping the html from to because otherwise I’m getting an the following error:

HTMLParser.HTMLParseError: bad end tag: u"</scri'+'pt>", at line 89, column 222

So what I’m doing right now is the following

response = requests.get(url, headers=headers)
html = response.text
tablestart = html.find('<!-- pitch view -->') + 19
tableend = html.find('<!-- end ismPitch -->')
html = html[tablestart:tableend]
soup = BeautifulSoup(html)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T09:49:28+00:00

I would implement your above code in this manner

import urllib2
from bs4 import BeautifulSoup
response = urllib2.urlopen("http://fantasy.premierleague.com/entry/38861/event-history/8/")
html = response.read()
tablestart = html.find('<!-- pitch view -->') + 19
print tablestart
tableend = html.find('<!-- end ismPitch -->')
print tableend
html = html[tablestart:tableend]
soup = BeautifulSoup(html)

Output of the above code is

55594
92366

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having some difficulties with Beautifulsoup. You can find the html here -> http://pastebin.com/Nr1k0dcM

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply