I’m trying to put a list of URLs into a csv file that I’m scraping from a webpage using urllib2 and BeautifulSoup. I have tried writing the links to a csv file as unicode and also converted to utf-8. In both cases, each letter is inserted into a new field.
Here’s my code (I’ve tried it at least these two ways):
f = open('filename','wb')
w = csv.writer(f,delimiter=',')
for link in links:
w.writerow(link['href'])
And:
f = open('filename','wb')
w = csv.writer(f,delimiter=',')
for link in links:
w.writerow(link['href'].encode('utf-8'))
links is a list that looks like this:
[<a href="#Flyout1" accesskey="2" class="quicklinks" tabindex="1" title="Skip to content">Quick Links: Skip to main page content</a>, <a href="#search" class="quicklinks" tabindex="1" title="Skip to search">Skip to Search</a>, <a href="#News" class="quicklinks" tabindex="1" title="Skip to Section table of contents">Skip to Section Content Menu</a>, <a href="#footer" class="quicklinks" tabindex="1" title="Skip to site options">Skip to Common Links</a>, <a href="http://www.hhs.gov"><img src="/ucm/groups/fdagov-public/@system/documents/system/img_fdagov_hhs_gov.png" alt="www.hhs.gov link" style="width:112px; height:18px;" border="0" /></a>]
Not all the links have an 'href' key but I check for that in code not shown here. In both cases, the correct strings are written to the csv file, but each letter is in a new field.
Any thoughts?
From the docs: “A row must be a sequence of strings or numbers …” You are passing a single string, not a sequence of strings, so it treats each letter as an item. Put your string in a list.
So change
w.writerow(link['href'])tow.writerow([link['href']]).Note: A csv file with a single column looks exactly like a flat text file. Maybe you don’t need csv.