I am trying to scrape data from this webpage http://www.verizonwireless.com/wcms/consumer/shop/share-everything.html using below mentioned code:
# -*- coding: cp1252 -*-
import csv
import urllib2
import sys
import urllib
import time
from bs4 import BeautifulSoup
from itertools import islice
url = 'http://www.verizonwireless.com/wcms/consumer/shop/share-everything.html'
user_agent = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1;Trident/5.0)'
req = urllib2.Request(url,headers={ 'User-Agent' : user_agent })
response = urllib2.urlopen(req)
page = response.read()
soup = BeautifulSoup(page)
tabcontent = soup.find('div', {"id": "uttsdPlanOptions", "class": "priceCol2"})
content = tabcontent.findAll('tr')
print content
After printing the content I realised I am not getting the data values in GB mentioned on website, when I tried inspecting element of the “GB” part I found this html structure <p class="ptData">Shareable Data</p> There was no mention of GB in this part also there was no image linked which could have explained the missing value of GB.
The value you are looking doesn’t exist as text. It is an image obtained from URL
/content/dam/vzw/lobs/consumer/shop/share-everything/data-sprite.pngand cropped to size using CSS code:The table you are trying to obtain the values from has records this like:
The
classattribute of the<p>tag gets the image based on theclassattribute of the preceding<td>(inherited). So can derive the value you want from theclassattribute value in the<td>tag.