I am working on python (pandas specifically) to analyze a dataset. (Python is too awesome, the power of open source is amazing). I am having trouble with a specific part of my dataset.
I have the following data set,
time,contract,ticker,expiry,strike,quote,price,volume
08:01:08,C,PXA,20100101,4000,A,57.8,60
08:01:11,C,PXA,20100101,4000,A,58.4,60
08:01:12,C,PXA,20100101,4000,A,58,60
08:01:16,C,PXA,20100101,4000,A,58.4,60
08:01:16,C,PXA,20100101,4000,A,58,60
08:01:21,C,PXA,20100101,4000,A,58.4,60
08:01:21,C,PXA,20100101,4000,A,58,60
and it goes on …
I am using pandas to load the data. After this, I would like to be able to do the following, take a volume weighted average of the time there are duplicates.
i.e. since there are two asks at time 08:01:16, I would like to calculate the average price based on volume which would be (58.4*60 + 58*60)/(60+60) and an average of the volume on the volume column which would be (60+60)/2.
1 Answer