I am trying to run a jack-knife using Plyr. I have a large dataset

Question

0

Asked: May 26, 20262026-05-26T23:28:00+00:00 2026-05-26T23:28:00+00:00

I am trying to run a jack-knife using Plyr. I have a large dataset

0

I am trying to run a jack-knife using Plyr. I have a large dataset (715 sites over 10 years). I have already calculated the Species Richness (count of all species present) in a square for each year but now I want to calculate new Richness values, after taking out one species at a time and have them all in one dataset.

Example data:

Site <- c(1,1,1,1,1,1)
Year <- c(96,96,96,97,97,97)
SpID <- c(1,2,3,1,2,3)
Count <- c(1,1,1,1,1,1)
data <- cbind(Site, Year, SpID)

So overall for Site 1 the species richness is 3 in both years. If I want recalculate this without one of the species it would now be 2.

I have tried using the following code:

foo<-function(z){
    data2 <- subset(data, SpID != (z))
    summaryBy(Count~ Year + Site, 
              data = data2, 
              FUN = function(x) { c(l = length(x)) } )
}

richall<- ddply(data,.(SpID),foo)

But I’m obviously making a mistake somewhere! Any thoughts?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T23:28:01+00:00

With your example data and call to ddply, this is what will happen:

ddply will find the different values in the SpID column of your
dataset (1, 2 and 3)
It will next create a data.frame for each of these unique values.
Each of these data.frames will hold only the rows for which the
SpID is equal to that unique value (so: a data.frame with the first
and fourth rows, one with the second and fifth and one with the third
and last rows)
Function foo will now be called, passing each of these data.frames
one at a time as its first argument

So it is rather obvious now that this will not help in doing jack-knife. In fact I don’t see an obvious way of attaining that with plyr. In this particular case you’re probably better off rigging your own with similar logic. Something like:

listOfResults <- 
    lapply(unique(data$SpID), 
           function(curID) {
               curDF<-data[data^SpID!=curID,]
               summaryBy(...,data=curDF)
           })

You can then recombine your results with e.g. ?do.call.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to run a jack-knife using Plyr. I have a large dataset

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply