x
Team Date Score
A 1-1-2012 80
A 1-2-2012 90
A 1-3-2012 50
A 1-4-2012 40
B 1-1-2012 100
B 1-2-2012 60
B 1-3-2012 30
B 1-4-2012 70
etc
I need to and can turn this data frame to wide data frame one row for each team with all the observations and dates as the heading:
xx
Team 1-1-2012 1-2-2012 1-3-2012 1-4-2012
A 80 90 50 40
B 100 60 30 70
I need to calculate the mean and sd for each row, which I can do:
xx
Team 1-1-2012 1-2-2012 1-3-2012 1-4-2012 mean sd
A 80 90 50 40 75 20
B 100 60 30 70 55 10
Considering I have thousands of row in data frame xx. I would like to do calculation on each cell as this:
if abs(xx-Mean) > 3*SD, create a counter column name and increment the value. The idea is that compare each observation against the mean and sd, if each observation for a given team matches this – abs(xx-Mean) > 3*SD, increment the counter. After checking each cell, I would like to look at each counter for each team and get the top ten high team that has the highest counter value. Basically I am trying to detect the most outliers. Once I get the top 10 team names, I would like to graph their time series data on data frame x.
I hope I am not making this more complicated than it should be. Not sure, R already has function to do calculations on each cell. Any ideas how to accomplish this is appreciated?
A
long-format,data.tableapproachOr, in a single step
To add the number of outliers (sum on a logical variable will coerce to 0,1)