I just picked up pandas, thinking that it will enable me to do data analysis nicely in python. Now I have a pandas data frame of the following form:
pandas.DataFrame({"p1": [1, 1, 2, 2, 3, 3]*2,
"p2": [1]*6+[2]*6,
"run": [1, 2]*6,
"result": xrange(12)})
p1 p2 result run
0 1 1 0 1
1 1 1 1 2
2 2 1 2 1
3 2 1 3 2
4 3 1 4 1
5 3 1 5 2
6 1 2 6 1
7 1 2 7 2
8 2 2 8 1
9 2 2 9 2
10 3 2 10 1
11 3 2 11 2
I would like to generate the frame that contains one entry for every set of parameters p1 and p2 with the average of all values of result for these parameters, that is,
p1 p2 result
0 1 1 0.5
1 2 1 2.5
2 3 1 4.5
3 1 2 6.5
4 2 2 8.5
5 3 2 10.5
What is the pandas way to do this? I would try to copy the original table, drop columns that differ (result and run), reindex that, combine both things again with the new index as multi-index and then run the mean method for that outer multi-index level. Is that the way to do it, and if yes, how do I do these index things properly in code?
You can use groupby (I have called your dataframe df):
This results in a MultiIndex DataFrame. To get the layout in your question, select only the columns you want and reset the index: