I have a sample dataframe sample.data as follows:
x y z
1 0 1
1 0 1
1 0 1
1 0 1
1 0 2
1 0 2
1 0 2
1 0 2
1 0 2
0 1 2
I need to find the max and sum of x and y for each category of z (z is like 1,2,…600). I use ddply from plyr for this:
library(plyr)
z.group<-ddply (sample.data,.(z),summarize,max_x=max(x), max_y=max(y), sum_x=sum(x), sum_y=sum(y))
z.group
z max_x max_y sum_x sum_y
1 1 0 4 0
2 1 1 5 1
Now, I need to insert these sum_x, sum_y, max_x, and max_y as the columns of sample.data under the related rows. For example, if max_x is 1 for z=1, then I insert max_x is 1 for all rows with z=1. The expected output is
x y z max_x max_y sum_x sum_y
1 0 1 1 0 4 0
1 0 1 1 0 4 0
1 0 1 1 0 4 0
1 0 1 1 0 4 0
1 0 2 1 1 5 1
1 0 2 1 1 5 1
1 0 2 1 1 5 1
1 0 2 1 1 5 1
1 0 2 1 1 5 1
0 1 2 1 1 5 1
I wonder how do I get the expected output?
You can do it directly in one step , using
transform