I am scraping 2 sets of data from a website using beautiful soup and I want them to output in a csv file in 2 columns side by side. I am using spamwriter.writerow([x,y]) argument for this but I think because of some error in my recursion structure, I am getting the wrong output in my csv file. Below is the referred code:
import csv
import urllib2
import sys
from bs4 import BeautifulSoup
page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
with open('Smartphones_20decv2.0.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"},text=True):
if anchor.string:
print unicode(anchor.string).encode('utf8').strip()
for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
textcontent = u' '.join(anchor1.stripped_strings)
if textcontent:
print textcontent
spamwriter.writerow([unicode(anchor.string).encode('utf8').strip(),textcontent])
Output which I am getting in csv is:
Samsung Focus® 2 (Refurbished) $99.99
Samsung Focus® 2 (Refurbished) $99.99 to $199.99 8 to 16 GB
Samsung Focus® 2 (Refurbished) $0.99
Samsung Focus® 2 (Refurbished) $0.99
Samsung Focus® 2 (Refurbished) $149.99 to $349.99 16 to 64 GB
Problem is I am getting only 1 device name in column 1 instead of all while price is coming for all devices.
Please pardon my ignorance as I am new to programming.
You are using
anchor.string, instead ofarchor1.anchoris the last item from the previous loop, instead of the item in the current loop.Perhaps using clearer variable names would help avoid confusion here; use
singleitemandgridpriceperhaps?It could be I misunderstood though and you want to combine each
anchor1with a correspondinganchor. You’ll have to loop over them together, perhaps usingzip():Normally it should be easier to loop over the parent table row instead, then find the cells within that row within a loop. But the
zip()should work too, provided theclickStreamSingleItemcells line up with thelistGrid-pricematches.