i have the following code…
reader=csv.DictReader(open("test1.csv","r"))
allrows = list(reader)
keepcols = [c for c in allrows[0] if all(r[c] != '0' for r in allrows)]
print keepcols
writer=csv.DictWriter(open("output1.csv","w"),fieldnames='keepcols',extrasaction='ignore')
writer.writerows(allrows)
i have a csv file which has about 45 cols..
the first column has some names..
except the first column, all others have only 0’s and 1’s…
and of course, the whole table has some titles as well..
i m trying to read columns from csv file and i need to extract only those cols with 1’s
the problem is the output file is empty even though there are a few columns in the table with 1’s..
could somebody please help me out…. 🙁 i m stuck terribly..
Title 3003_contact 3003_backbone 3003_sidechain 3003_polar 3003_hydrophobic 3003_acceptor 3003_donor 3003_aromatic
l1 1 1 0 1 1 0 0 0
l1 1 0 1 0 0 0 1 0
l1 1 0 0 0 0 0 0 0
l1 1 0 0 0 1 0 0 1
l1 1 0 0 0 0 0 0 0
l2 1 0 0 0 1 0 0 0
l2 1 0 0 0 0 1 0 0
l3 1 0 0 0 0 0 0 0
l3 1 0 0 0 0 0 1 0
l3 1 0 0 0 0 0 0 1
l3 1 0 0 0 0 0 0 0
l3 1 0 0 0 0 0 0 0
l4 1 0 0 0 0 0 0 0
l4 1 0 0 0 0 0 0 0
l4 1 0 0 0 0 0 0 0
it returns only column 1… I’ve tried changing ‘keepcols’ to keepcols… and I get column2 first and then column1 as output
Edit: If the input file is a comma-separated values file, then
to maintain the order of the keys, use
reader.fieldnamesinstead of the keys inallrows[0].So the solution would be:
The input file posted above looks like it has space-separated columns. In this case, I don’t think
csvis the right tool for parsing it. Instead, you can usesplit:Edit2: The reason why the column order was changing is because
for c in allrows[0]returns the keys ofallrows[0]in an unspecified order.dictkeys are not ordered by default. The above code works around this by definingfieldsto be a list, not adict.Original answer:
Change
fieldnames='keepcols'tofieldnames=keepcols.fieldnamesneeds to be a sequence of keys, such as['fieldA','fieldB',...].A potential pitfall to be aware of in Python is that strings are sequences. When you iterate over a string, you get the characters of the string. So when you say
fieldnames='keepcols', you are settingfieldnamesto be the sequence of characters['k','e','e','p','c','o','l','s']. You don’t get an error because this is a valid sequence of keys. But your list of dicts,allrowsdoesn’t happen to have these keys.writer.writerowsblithely ignores this sinceextrasaction='ignore'.