This was covered in this post: Python web scraping involving HTML tags with attributes
But I haven’t been able to do something similar for this web page: http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland?
I’m trying to scrape the values of:
<td class="price city-2">
NZ$15.62
<span style="white-space:nowrap;">(AU$12.10)</span>
</td>
<td class="price city-1">
AU$15.82
</td>
Basically price city-2 and price city-1 (NZ$15.62 and AU$15.82)
Currently have:
import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland?"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
price2 = soup.findAll('td', attrs = {'class':'price city-2'})
price1 = soup.findAll('td', attrs = {'class':'price city-1'})
for price in price2:
print price
for price in price1:
print price
Ideally, I’d also like to have comma separated values for:
<th colspan="3" class="clickable">Food</th>,
Extracting ‘Food’,
<td class="item-name">Daily menu in the business district</td>
Extracting ‘Daily menu in the business district’
and then the values for price city-2, and price-city1
So the printout would be:
Food, Daily menu in the business district, NZ$15.62, AU$15.82
Thanks!
I find BeautifulSoup awkward to use. Here is a version based on the webscraping module:
Output: