I am trying to use ddply to my sample data (call Z) which look like as below:
id y
1001 10
1001 11
1200 12
2001 10
2030 12
2100 32
3100 10
3190 13
4100 45
5100 67
5670 56
...
10001 54
10345 45
11234 32
and so on
My purpose is the find the sum of the y for the id starting with 1 (i.e.1001,1200,..), 2(2100), 3(3100,3190), 4,…10,11,…65. For example, for id starting with 1 , the sum is 10+11+12=33, for id starting with 2, it is 32.
I wanted to use the apply function which looks like as follows:
>s <- split(z,z$id)
>lapply(s, function(x) colSums(x[, c("y")]))
However, this gives me the sum by each of the unique id, not the one as I was looking for. Any suggestion in this regard would be highly appreciated.
thelatemail provides a valid approach but I want to point out the problem isn’t really with your understanding of
lapply(your code was almost correct) but with thinking about grouping. thelatemail does this in his solution and that’s the key. I’m going to show you with your approach and then how I would actually approach this and then usingavejust because I never get to use it 🙂Read in data
Your code adjusted
Approach I would likely take; add a new factor id variable