I understand what tapply() does in R. However, I cannot parse this description of it from the documentaion:
Apply a Function Over a "Ragged" Array
Description:
Apply a function to each cell of a ragged array, that is to each
(non-empty) group of values given by a unique combination of the
levels of certain factors.
Usage:
tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
When I think of tapply, I think of group by in sql. You group values in X together by its parallel factor levels in INDEX and apply FUN to those groups. I have read the description of tapply 100 times and still can’t figure out how what it says maps to how I understand tapply. Perhaps someone can help me parse it?
Let’s see what the R documentation says on the subject:
The list of factors you supply via
INDEXtogether specify a collection of subsets ofX, of possibly different lengths (hence, the ‘ragged’ descriptor). And thenFUNis applied to each subset.EDIT: @Joris makes an excellent point in the comments. It may be helpful to think of
tapply(X,Y,...)as a wrapper forsapply(split(X,Y),...)in that if Y is a list of grouping factors, it builds a new, single grouping factor based on their unique levels, splits X accordingly and applies FUN to each piece.EDIT: Here’s an illustrative example: