I am trying to figure out a better way to extract data from a large number of files, run additional calculations on some data, and finally format it into a something readable by a spreadsheet application. Below is how I am currently doing this but I am convinced there must be an easier way.
First I create a definition that will search the files and extract the data. I use a definition since sometimes the data will be compiled from several locations. I am doing this via a mixture of regular expressions and find all. This can be formatted differently if needed. The result of this is something like:
RawData=
[[M1,A1,305.91,288.12,Variable_M1_A1],
[M1,A2,319.07,303.70,Variable_M1_A2],
[M2,A1,237437.32,191460.91,Variable_M2_A1],
[M2,A2,270773.28,192581.05,Variable_M2_A2]]
What I wish to do with some of the data is organize it in such a way that a grid is created from the variables A and M and then the data from a column is placed in the correct grid location. Realistically looks like a simple two dimension matrix with the first row and first column indicating the variables.
What the csv file should look like:
Placeholder,A1,A2
M1,Variable_M1_A1,Variable_M1_A2
M2,Variable_M2_A1,Variable_M2_A2
What I currently am doing is creating an empty matrix (3×3 in this case) calling it Result then running the following code. The basically iterates over all lines and all variables when the variables match, it assigns the Result matrix a value from the RawData.
MVar=[M1,M2]
AVar=[A1,A2]
for a in range(len(RawData):
for b in range(len(MVar)):
for c in range(len(AVar)):
if RawData[a][0]==Mvar[b] and RawData[a][1]==AVar[c]:
Result[b+1][c+1]=RawData[a][4]
I wrote a chunk of code that will take the RawData matrix and create the Result matrix based off all possible combinations of M1 and A1. If I want to outport this to .csv I simply use csv.writer. So this works great if I just want to organize the data that is already there. However when I want to do calculations with the data- such as generating another column that is based off values in the RawData that did not end up in the matrix- it becomes difficult. For example to take what the csv file output could look like:
Placeholder,A1,A2,NewA
M1,Variable_M1_A1,Variable_M1_A2,(RawData[0][3]*RawData[1][2])
M2,Variable_M2_A1,Variable_M2_A2,(RawData[2][3]*RawData[3][2])
Notice that the data for the calculations require the same M value but different A values. While this can be done it quickly becomes very convoluted.
Is there a simpler way to do this?
Edit:
Auto generation of the M and A list using:
[MethodList[i] for i,x in enumerate(MethodList) if x not in MethodList[i+1:]]
This seems makes it easier to work with, however its still a convolute process!
This was completed with the for loops and corresponding if conditions. I am still convinced there is a better way but this is entirely possible to do.