So I’m dealing with a csv file that has missing values.
What I want my script to is:
#!/usr/bin/python
import csv
import sys
#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for x in row[:]:
if len(x)< 1:
x = 0
print x
print row
Here is an example of data, I trying it on, ideally it should work on any column lenghth
Before:
actnum,col2,col4
xxxxx , ,
xxxxx , 845 ,
xxxxx , ,545
After
actnum,col2,col4
xxxxx , 0 , 0
xxxxx , 845, 0
xxxxx , 0 ,545
Any guidance would be appreciated
Update Here is what I have now (thanks):
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
print row
However it only seems to out put one record, I will be piping the output to a new file on the command line.
Update 3: Ok now I have the opposite problem, I’m outputting duplicates of each records.
Why is that happening?
After
actnum,col2,col4
actnum,col2,col4
xxxxx , 0 , 0
xxxxx , 0 , 0
xxxxx , 845, 0
xxxxx , 845, 0
xxxxx , 0 ,545
xxxxx , 0 ,545
Ok I fixed it (below) thanks you guys for your help.
#!/usr/bin/python
import csv
import sys
#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
print ','.join(str(x) for x in row)
Change your code:
into:
Not sure what you think you’re accomplishing by the
print, but the key issue is that you need to modifyrow, and for that purpose you need an index into it, whichenumerategives you.Note also that all other values, except the empty ones which you’re changing into the number
0, will remain strings. If you want to turn them intoints you have to do that explicitly.