Here is a small example to illustrate my data:
> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
feature=c(rep("a",6), rep("b",3)),
var=rep(1:3, each=3),
data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
subgroup feature var data
1 s1 a 1 1.53152620
2 s2 a 1 1.25476445
3 s3 a 1 1.04221040
4 s1 a 2 1.68913400
5 s2 a 2 1.48290273
6 s3 a 2 1.62871854
7 s1 b 3 0.05278296
8 s2 b 3 -0.66623654
9 s3 b 3 -1.40006454
I want to examine the sum of the “data” column for each combination of feature-var that are present in my dataset. More precisely, I want to obtain TRUE when the sum is bigger than 3, and FALSE otherwise:
> result
feature snp res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE
I tried using “aggregate” or “by”, but can’t make them fit my need. Any idea? Thanks in advance.
One approach is to use
plyr‘s functionddplyto group on feature and var. You can use thesummarizefunction to create a newdata.framewith a column that corresponds to the rule you developed.Results in:
Another alternative is to use
data.tablewhich is supposed to provide some performance benefits: