I’m trying to do an equivalent group by summary in R through the plyr

Question

0

Asked: June 8, 20262026-06-08T13:59:19+00:00 2026-06-08T13:59:19+00:00

I’m trying to do an equivalent group by summary in R through the plyr

0

I’m trying to do an equivalent group by summary in R through the plyr function named ddply. I have a data frame which have three columns (say id, period and event). Then, I’d like to count the times each id appears in the data frame (count(*)... group by id with SQL) and get the last element of each id corresponding to the column event.

Here an example of what I have and what I’m trying to obtain:

  id period event #original data frame
  1      1     1
  2      1     0
  2      2     1
  3      1     1
  4      1     1
  4      1     0

  id  t  x #what I want to obtain
  1   1  1
  2   2  1
  3   1  1
  4   2  0

This is the simple code I’ve been using for that:

 teachers.pp<-read.table("http://www.ats.ucla.edu/stat/examples/alda/teachers_pp.csv", sep=",", header=T) # whole data frame
 datos=ddply(teachers.pp,.(id),function(x) c(t=length(x$id), x=x[length(x$id),3])) #This is working fine.

Now, I’ve been reading The Split-Apply-Combine Strategy for Data
Analysis and it is given an example where they employed an equivalent syntax to the one I put below:

  datos2=ddply(teachers.pp,.(id), summarise, t=length(id), x=teachers.pp[length(id),3]) #using summarise but the result is not what I want.

This is the data frame I get using datos2

So, my question is: why is this result different from the one I get using the first piece of code, I mean datos1? What am I doing wrong?

It is not clear for me when I have to use summarise or transform. Could you tell me the correct syntax for the ddply function?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T13:59:20+00:00

Editorial Team

2026-06-08T13:59:20+00:00Added an answer on June 8, 2026 at 1:59 pm

When you use summarise, stop referencing the original data frame. Instead, just write expressions in terms of the column names.

You tried this:

ddply(teachers.pp,.(id), summarise, t=length(id), x=teachers.pp[length(id),3])

when what you probably wanted was something more like this:

ddply(teachers.pp,.(id), summarise, t=length(id), x=tail(event,1))

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to do an equivalent group by summary in R through the plyr

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply