I am trying to reshape/ reduce my data. So far, I employ a for

Question

0

Asked: June 2, 20262026-06-02T17:49:51+00:00 2026-06-02T17:49:51+00:00

I am trying to reshape/ reduce my data. So far, I employ a for

0

I am trying to reshape/ reduce my data. So far, I employ a for loop (very slow) but from what I perceive, this should be quite fast with Plyr.

I have many groups (firms, as a factor in the dataset) and I want to drop entirely every firm which shows a 0 entry for value in any of that firm’s cells. I thus create a new data.frame but leave out all groups showing 0 for value at some point.

The forloop:

Data Creation:

set.seed(1) 
mydf <- data.frame(firmname = sample(LETTERS[1:5], 40, replace = TRUE), 
        value = rpois(40, 2))

-----------------------------
splitby = mydf$firmname


new.data <- data.frame()

for (i in 1:(length(unique(splitby)))) {
temp <- subset(mydf, splitby == as.character(paste(unique(splitby)[i]))) 
    if (all(temp$value > 0) == "TRUE") {     
    new.data <- rbind(new.data, temp) 
} 
} 

Delete all empty firm factors 
new.data$splitby <- factor(new.data$splitby)

Is there a way to achieve that with the plyr package? Can the subset function be used in that context?

EDIT: For the purpose of the reproduction of the problem, data creation, as suggested by BenBarnes, is added. Ben, thanks a lot for that. Furthermore, my code is altered so as to comply with the answers provided below.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T17:49:53+00:00

You could supply an anonymous function to the .fun argument in ddply():

set.seed(1)

mydf <- data.frame(firmname = sample(LETTERS[1:5], 40, replace = TRUE),
  value = rpois(40, 2))

library(plyr)

ddply(mydf,.(firmname), function(x) if(any(x$value==0)) NULL else x )

Or using [, as suggested by Andrie:

firms0 <- unique(mydf$firmname[which(mydf$value == 0)])

mydf[-which(mydf$firmname %in% firms0), ]

Note that the results of ddply are sorted according to firmname

EDIT

For the example in your comments, this approach is again faster than using ddply() to subset, selecting only firms with more than three entries:

firmTable <- table(mydf$firmname)

firmsGT3 <- names(firmTable)[firmTable > 3]

mydf[mydf$firmname %in% firmsGT3, ]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to reshape/ reduce my data. So far, I employ a for

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply