I’m having an interesting problem with Python and BeautifulSoup4.
My method fetches local student restaurants’ menus for the day by given restaurant (dict keys) and then displays those.
def fetchFood(restaurant):
# Restaurant id's
restaurants = {'assari': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGMG4Agw', 'delica': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGPnPAgw', 'ict': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGPnMAww', 'mikro': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGOqBAgw', 'tottisalmi': 'restaurant_aghtdXJraW5hdHIaCxISX1Jlc3RhdXJhbnRNb2RlbFYzGMK7AQw'}
if restaurants.has_key(restaurant.lower()):
soup = BeautifulSoup(urllib.urlopen("http://murkinat.appspot.com"))
meal_div = soupie.find(id="%s" % restaurants[restaurant.lower()]).find_all("td", "mealName hyphenate")
mealstring = "%s: " % restaurant
for meal in meal_div:
mealstring += "%s / " % meal.string.strip()
mealstring = "%s @ %s" % (mealstring[:-3], "http://murkinat.appspot.com")
return mealstring
else:
return "Restaurant not found"
It’s going to be part of my IRCBot but currently it only works on my testing machine (Ubuntu 12.04 with Python 2.7.3) but on the other machine running the bot (Xubuntu with Python 2.6.5) it fails.
After the line
soup = BeautifulSoup(urllib.urlopen("http://murkinat.appspot.com"))
>>> type(soup)
<class 'bs4.BeautifulSoup'>
and I can print it and it shows having all the content that is supposed to be but it cand find anything. If I do this:
>>> print soup.find(True)
None
>>> soup.get_text()
u'?xml version="1.0" encoding="utf-8" ?'
it stops reading to the first line although on the other machine, it perfectly reads everything.
The output should be like this (from the working machine with restaurant parameter “Tottisalmi” at this date):
Tottisalmi: Sveitsinleike, kermaperunat / Jauheliha-perunamusaka / Uuniperuna, kylmäsavulohitäytettä / Kermainen herkkusienikastike @ http://murkinat.appspot.com
I’m completely clueless with this. I have many similar kind of BeautifulSoup parsing methods that work just fine on the bot (it parses titles of urls and Wikipedia stuff) but this one keeps bugging me.
Does anyone have any idea? I can only come up with it having something to do with my Python version which sounds odd since in everywhere else BeautifulSoup4 works fine.
I believe you have different parsers installed on the two machines. The html5lib parser fails on the given markup, giving the bad behavior. The lxml and html.parser parsers parse the markup correctly and don’t give the bad behavior.
When writing code that will be run on multiple machines, it’s best to explicitly state which parser you want to use:
This way you’ll get an error if the appropriate parser isn’t installed.