I’m having some difficulties with Beautifulsoup.
You can find the html here -> http://pastebin.com/Nr1k0dcM
after that I simply run a soup = BeautifulSoup(html)
print soup.prettify()
There shouldn’t be any difference in the result from the html but I only get this > http://pastebin.com/Y6DmEj40
I really don’t understant what’s going on here…
EDIT:
This is one of the url I’m scrapping for example: http://fantasy.premierleague.com/entry/38861/event-history/8/
I’m only scrapping the html from to because otherwise I’m getting an the following error:
HTMLParser.HTMLParseError: bad end tag: u"</scri'+'pt>", at line 89, column 222
So what I’m doing right now is the following
response = requests.get(url, headers=headers)
html = response.text
tablestart = html.find('<!-- pitch view -->') + 19
tableend = html.find('<!-- end ismPitch -->')
html = html[tablestart:tableend]
soup = BeautifulSoup(html)
I would implement your above code in this manner
Output of the above code is