I have a dataframe:
> df <- data.frame(
+ Species = rep(LETTERS[1:4], times=c(5,6,7,6)),
+ Length = rep(11:14, each=3)
+ )
>
> df
I need to be able to count the number of individuals of a certain Length for each Species (i.e., how many individuals in Species A have a length of 1, 2, 3, etc?) Then, I need to perform a series of additional analyses on the output. For example, I need to calculate the density of individuals of each length, and the decrease in density from one length class to the next.
This is easy if I subset the data first:
Spec.A<-df[df$Species=="A",]
#count number of specimens of each length;
count<-table(Spec.A$Length)
count
#calculate density per length category (divide by total area sampled =30)
density<-count/(30)
density
#calculate the decrease in density (delta.N) from one length category to the next;
delta.N<-diff(density, lag=1, differences=1)
delta.N
The problem is that I need to do these calculations for each species (i.e., to loop through each subset).
On the one hand, I could use tapply(), with a function that uses table();
#function: count number of specimens of each length;
count<-function(x){
table(x)
}
Number<-tapply(df$Length, df$Species, FUN=count, simplify=FALSE)
Number
This gives me what I want, but the format of the output is funky, and I can’t figure out how to perform additional analyses on the results.
I have tried using ddply() from plyr, something like:
ddply(df$Length, df$Species,
count)
But I clearly don’t have it right, and I’m not even sure ddply() is appropriate for my problem, given that I have a different number of length observations for each species.
Should I be looking more closely at other options in plyr? Or is there a way to write a for loop to do what I need?
You’re on the right track!
tapplywith list output is definitely one way to go, and may be a good choice since your outputs will have varying lengths.ddply, like you guessed, is another way. The key is that the output of the function you give to ddply should be a data frame with all your statistics in a “long” mode (so that they will stack nicely). The simplecountfunction can’t do this, so you’ll need to make your own function. The way I go about devising a function for a ddply call like this is actually very similar to what you were doing: I get a subset of the data, and then craft my function using that. Then, when you submit it to ddply, it’ll nicely apply that function across all the subsets.