note: this is a direct follow up to this previous question
I have very long dataframe consisting of two columns that I am using as arguments for a function that will find the value of a third column using mapply as so:
df$3rd <- mapply(myfunction, A=df$1st, B=df$2nd)
where myfunction has arguments A and B. While this works great for small datasets, it stalls for large datasets so I was thinking a good way to approach the problem would be to apply this function using ddply. I don’t know if ddply is the best approach for this problem but I am also having some trouble with syntax. So suggestions for either would be appreciated.
This is what I am trying:
> df$3rd <- ddply(df, .(1st), function(x) x$3rd <-
> mapply(myfunction, A=x$1st, B=df$second))
and this is the error I am getting:
Error in `$<-.data.frame`(`*tmp*`, "n", value = c(1L, 1L, 1L, 1L, 1L, :
replacement has 112 rows, data has 16
EDIT:
In light of the answer and comments I I am posting a small reproducible example below – it is one of the answers from the previous question. However as the commenters below note, ddply is probably not the way to go. I am trying Ramnath’s solution right now.
library(reshape2)
foo <- data.frame(x = c('a', 'a', 'a', 'b', 'b', 'b'),
y = c('ab', 'ac', 'ad', 'ae', 'fx', 'fy'))
bar <- data.frame(x = c('c', 'c', 'c', 'd', 'd', 'd'),
y = c('ab', 'xy', 'xz', 'xy', 'fx', 'xz'))
nShared <- function(A, B) {
length(intersect(with(foo, y[x==A]), with(bar, y[x==B])))
}
# Enumerate all combinations of groups in foo and bar
(combos <- expand.grid(foo.x=unique(foo$x), bar.x=unique(bar$x)))
# Find number of elements in common among all pairs of groups
combos$n <- mapply(nShared, A=combos$foo.x, B=combos$bar.x)
# Reshape results into matrix form
dcast(combos, foo.x ~ bar.x)
# foo.x c d
# 1 a 1 0
# 2 b 0 1
ddplyisn’t what you’re after here,ddply(df,.(1st), FUNCTION)is more like:That is, it makes
outdfconsisting ofFUNCTIONapplied to subsets ofdfdetermined by column1st.In any case, I think your error might be because you have
dfinstead ofxinfunction(x) x$3rd<-mapply(myfunction,A=x$1st, B=df$second)(theBargument)? Although it is hard to tell without a working example.What exactly does
myfunctiondo? I think your best bet is to vectorisemyfunctionso that you can just dodf$third <- myfunction( A=df$first, B=df$second ).For example, if
myfunction <- function(A,B) { A+B }, instead of doingmapply(myfunction,df$first,df$second)you could equivalently domyfunction(df$first,df$second)and not even need mapply at all.