I am trying to reshape/ reduce my data. So far, I employ a for loop (very slow) but from what I perceive, this should be quite fast with Plyr.
I have many groups (firms, as a factor in the dataset) and I want to drop entirely every firm which shows a 0 entry for value in any of that firm’s cells. I thus create a new data.frame but leave out all groups showing 0 for value at some point.
The forloop:
Data Creation:
set.seed(1)
mydf <- data.frame(firmname = sample(LETTERS[1:5], 40, replace = TRUE),
value = rpois(40, 2))
-----------------------------
splitby = mydf$firmname
new.data <- data.frame()
for (i in 1:(length(unique(splitby)))) {
temp <- subset(mydf, splitby == as.character(paste(unique(splitby)[i])))
if (all(temp$value > 0) == "TRUE") {
new.data <- rbind(new.data, temp)
}
}
Delete all empty firm factors
new.data$splitby <- factor(new.data$splitby)
Is there a way to achieve that with the plyr package? Can the subset function be used in that context?
EDIT: For the purpose of the reproduction of the problem, data creation, as suggested by BenBarnes, is added. Ben, thanks a lot for that. Furthermore, my code is altered so as to comply with the answers provided below.
You could supply an anonymous function to the
.funargument inddply():Or using
[, as suggested by Andrie:Note that the results of
ddplyare sorted according tofirmnameEDIT
For the example in your comments, this approach is again faster than using
ddply()to subset, selecting only firms with more than three entries: