I have a bunch of CSV files (only two in the example below). Each CSV file has 6 columns. I want to go into each CSV file, copy the first two columns and add them as new columns to an existing CSV file.
Thus far I have:
import csv
f = open('combined.csv')
data = [item for item in csv.reader(f)]
f.close()
for x in range(1,3): #example has 2 csv files, this will be automated
n=0
while n<2:
f=open(str(x)+".csv")
new_column=[item[n] for item in csv.reader(f)]
f.close()
#print d
new_data = []
for i, item in enumerate(data):
try:
item.append(new_column[i])
print i
except IndexError, e:
item.append("")
new_data.append(item)
f = open('combined.csv', 'w')
csv.writer(f).writerows(new_data)
f.close()
n=n+1
This works, it is not pretty, but it works.
However, I have three minor annoyances:
-
I open each CSV file twice (once for each column), that is hardly elegant
-
When I print the
combined.csvfile, it prints an empty row following each row? -
I have to provide a
combined.csvfile that has at least as many rows in it as the largest file I may have. Since I do not really know what that number may be, that kinda sucks
As always, any help is much appreciated!!
As requested:
1.csv looks like (mock data)
1,a
2,b
3,c
4,d
2.csv looks like
5,e
6,f
7,g
8,h
9,i
the combined.csv file should look like
1,a,5,e
2,b,6,f
3,c,7,g
4,d,8,h
,,9,i
The line
for rows in IT.izip_longest(*readers, fillvalue=['']*2):can be understood with an example:
As you can see, IT.izip_longest behaves very much like
zip, except that it does not stop until the longest iterable is consumed. It fills in missing items withNoneby default.Now what happens if there were more than 3 items in
readers?We would want to write
but that’s laborious and if we did not know
len(readers)in advance, we wouldn’t even be able to replace the ellipsis (...) with something explicit.Python has a solution for this: the star (aka argument unpacking) syntax:
Notice the result
Out[4]is identical to the resultOut[3].The
*readerstells Python to unpack the items inreadersand send them along as individual arguments toIT.izip_longest.This is how Python allows us to send an arbitrary number of arguments to a function.