Hi here is my problem. I have a program that calulcates the averages of data in columns.
Example
Bob
1
2
3
the output is
Bob
2
Some of the data has ‘na’s
So for Joe
Joe
NA
NA
NA
I want this output to be NA
so I wrote an if else loop
The problem is that it doesn’t execute the second part of the loop and just prints out one NA. Any suggestions?
Here is my program:
with open('C://achip.txt', "rtU") as f:
columns = f.readline().strip().split(" ")
numRows = 0
sums = [0] * len(columns)
numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
for line in f:
# Skip empty lines since I was getting that error before
if not line.strip():
continue
values = line.split(" ")
for i in xrange(len(values)):
try: # this is the whole strings to math numbers things
sums[i] += float(values[i])
numRowsPerColumn[i] += 1
except ValueError:
continue
with open('c://chipdone.txt', 'w') as ouf:
for i in xrange(len(columns)):
if numRowsPerColumn[i] ==0 :
print 'NA'
else:
print>>ouf, columns[i], sums[i] / numRowsPerColumn[i] # this is the average calculator
The file looks like so:
Joe Bob Sam
1 2 NA
2 4 NA
3 NA NA
1 1 NA
and final output is the names and the averages
Joe Bob Sam
1.5 1.5 NA
Ok I tried Roger’s suggestion and now I have this error:
Traceback (most recent call last):
File “C:/avy14.py”, line 5, in
for line in f:
ValueError: I/O operation on closed file
Here is this new code:
with open(‘C://achip.txt’, “rtU”) as f:
columns = f.readline().strip().split(” “)
sums = [0] * len(columns)
rows = 0
for line in f:
line = line.strip()
if not line:
continue
rows += 1
for col, v in enumerate(line.split()):
if sums[col] is not None:
if v == “NA”:
sums[col] = None
else:
sums[col] += int(v)
with open(“c:/chipdone.txt”, “w”) as out:
for name, sum in zip(columns, sums):
print >>out, name,
if sum is None:
print >>out, “NA”
else:
print >>out, sum / rows
I’d also use the no-parameter version of split when getting the column names (it allows you to have multiple space separators).
Regarding your edit to include input/output sample, I kept your original format and my output would be:
This format is 3 rows of (ColumnName, Avg) columns, but you can change the output if you want, of course. 🙂