I am making a Python script that parses an Excel file using the xlrd library.
What I would like is to do calculations on different columns if the cells contain a certain value. Otherwise, skip those values. Then store the output in a dictionary.
Here’s what I tried to do :
import xlrd
workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')
num_rows = worksheet.nrows -1
num_cells = worksheet.ncols - 1
first_col = 0
scnd_col = 1
third_col = 2
# Read Data into double level dictionary
celldict = dict()
for curr_row in range(num_rows) :
cell0_val = int(worksheet.cell_value(curr_row+1,first_col))
cell1_val = worksheet.cell_value(curr_row,scnd_col)
cell2_val = worksheet.cell_value(curr_row,third_col)
if cell1_val[:3] == 'BL1' :
if cell2_val=='toSkip' :
continue
elif cell1_val[:3] == 'OUT' :
if cell2_val == 'toSkip' :
continue
if not cell0_val in celldict :
celldict[cell0_val] = dict()
# if the entry isn't in the second level dictionary then add it, with count 1
if not cell1_val in celldict[cell0_val] :
celldict[cell0_val][cell1_val] = 1
# Otherwise increase the count
else :
celldict[cell0_val][cell1_val] += 1
So here as you can see, I count the number of “cell1_val” values for each “cell0_val”. But I would like to skip those values which have “toSkip” in the adjacent column’s cell before doing the sum and storing it in the dict.
I am doing something wrong here, and I feel like the solution is much more simple.
Any help would be appreciated. Thanks.
Here’s an example of my sheet :
cell0 cell1 cell2
12 BL1 toSkip
12 BL1 doNotSkip
12 OUT3 doNotSkip
12 OUT3 toSkip
13 BL1 doNotSkip
13 BL1 toSkip
13 OUT3 doNotSkip
Use
collections.defaultdictwithcollections.Counterfor your nested dictionary.Here it is in action:
Here it is integrated into your code:
I also combined your if-statments and changed the calculation of
curr_rowto be simpler.