I am trying to create a CSV file with a list of URLs.
I am pretty new to programming, so please excuse any sloppy code.
I have a loop that runs through a list of places to get the list of URLs.
I then have a loop within that loop that exports the data to a CSV file.
import urllib, csv, re
from BeautifulSoup import BeautifulSoup
list_of_URLs = csv.reader(open("file_location_for_URLs_to_parse"))
for row in list_of_URLs:
row_string = "".join(row)
file = urllib.urlopen(row_string)
page_HTML = file.read()
soup = BeautifulSoup(page_HTML) # parsing HTML
Thumbnail_image = soup.findAll("div", {"class": "remositorythumbnail"})
Thumbnail_image_string = str(Thumbnail_image)
soup_3 = BeautifulSoup(Thumbnail_image_string)
Thumbnail_image_URL = soup_3.findAll('a', attrs={'href': re.compile("^http://")})
This is the part that isn’t working for me:
out = csv.writer(open("file_location", "wb"), delimiter=";")
for tag in soup_3.findAll('a', href=True):
out.writerow(tag['href'])
Basically the writer keeps on writing over itself, is there a way to jump to below the first empty row on the CSV and start writing?
Are you closing the file after every write, or opening the file before every write? Just check that.
Also, try using “ab” mode instead of “wb”. “ab” will append to the file.