I’m trying to use the aggregate function with cbind, but I must be missing something.
I’ve seen in Using Aggregate for Multiple Aggregations that I can simply define which column I want to be fixed and which I’d like to add, but I just can’t get the result I expected.
I have:
x <- data.frame(alfa = 1:9, beta = rep(1:3, 3))
alfa beta
1 1 1
2 2 2
3 3 3
4 4 1
5 5 2
6 6 3
7 7 1
8 8 2
9 9 3
And I want to retrieve the mean of the entries aggregated by the ones in column beta. For that I’ve tried:
aggregate(cbind(alfa) ~ beta, data = x, FUN = function(x) c(gama = mean(x)) )
That gives me:
beta alfa
1 1 4
2 2 5
3 3 6
Shouldn’t the result be something like:
alfa beta gama
1 1 1 4
2 2 2 5
3 3 3 6
How do I force the addition of column gama? Additionally, would someone clarify the basis of the cbind() function? I’ve been struggling to understand it. Regards!
Aggregate takes all elements on the left side of the
~and uses the given function on those values, while they are grouped by the values of the right side.Thus, your command
will return the mean values of
alfagrouped bybeta. (As you mentioned SQL – this is the same as will happen with the SQL-clauseSELECT beta, avg(alfa) FROM x GROUP BY beta)If you want to output the first value encountered, this basically is another aggregation that you want to do, thus your aggregation function has to return two values:
(Again in SQL:
SELECT beta, min(alfa), avg(alfa) FROM x GROUP BY beta)You asked about the
cbind. As long as you have only one argument on the left hand side, this does not matter at all. But suppose you have the following situation:and would like to compute, say, the mean of both columns
alfaandgamma, you could do it like this:That way you just tell the aggregate function to use throw
alfaandgammaboth at the given function.For more and exhaustive examples, see
?aggregate.Edit
You have to be careful not to mix different meanings of
cbind. Used a separate function, it concats two vectors (or data.frames) of the same length to a matrix (or data.frame) with both inputs as different columns:Used in the formula notation of aggregate
cbinddoes something related but yet different.cbind(column1, column2)just tells aggregate to use the given function on both rows seperately. Thus, something likewill not work. Rather, the function will be called two times – once with the values of
alfa, then with the values ofbeta.Hope that clarifies your understanding.