I have some large csv file, with about 200 header names (the first one of which is empty).
I want to get some chosen columns and copy them to a new output.csv file. My problem comes grabbing the header which has no name! (empty first element in the header)
So the input.csv looks something like,
,header1,header2,header3,header4, ... , header200
value0, value2, value2, value3, value4, ..., value200
,2,3,30,,, ... , 10
66,2,3,30,, ... , 10
etc (all rows have the same number of elements even if empty).
After reading various questions I’ve recycled some code from
write CSV columns out in a different order in Python
to write,
import csv
from operator import itemgetter
SelectedSignals = ['header1', 'header4']
fiin=open('input.csv','rb') #open to read "r" in binary mode "b"
fiout=open('output.csv','wb') #open to write "w" in binary mode "b"
reader = csv.reader(fiin, delimiter=',')
writer = csv.writer(fiout, delimiter=',')
AllSignalNames = reader.next()
name2index = dict((name, index) for index, name in enumerate(AllSignalNames))
writeindices = [name2index[name] for name in SelectedSignals]
reorderfunc = itemgetter(*writeindices) # itemgetter was imported from operator module
writer.writerow(SelectedSignals)
for row in reader:
writer.writerow(reorderfunc(row))
this gives the desired output,
say,
,header1,header4
value0, value4
,30
66,30
but the problem is doing,
SelectedSignals = [' ', 'header1', 'header4']
to grab the first column. which returns KeyError
I’m a python beginner, so any hints are appreciated.
In the CSV format, the first header should be a zero-length string (
''), not a space (' '), which is what you use inSelectedSignals.You could also add a fake column name to your
name2indexdict, for examplename2index['header0'] = 0just aftername2index = ...and then use'header0'inSelectedSignals.Alternatively, you could use a default value for the dict (when it can’t find the header you want, it would use this default value):
name2index.get(name, 0)instead ofname2index[name]in yourwriteindicesexpression.