I have following dataset:
name1 <- c("P1", "P2", "IndA", "IndB", "IndC", "IndD", "IndE", "IndF", "IndG")
name2 <- c("P1", "P2", "IndH", "IndI", "IndJ", "IndK")
name3 <- c("P1", "P2", "IndL", "IndM", "IndN")
name <- c(name1, name2, name3)
A <- c(1, 3, 1, 2, 2, 5, 5, 1, 4, 1, 3, 3, 1, 4, 3, 1, 1, 3,2,1 )
B <- c(2, 4, 3, 4, 2, 2, 6, 2, 2, 1, 4, 3, 1, 1, 5, 2,2, 1, 2, 1 )
family = c(rep(1, length (name1)), rep(2, length (name2)), rep(3, length (name3)))
mydf <- data.frame (family, name, A, B)
The following is process I want to apply each level of family variable:
dum.match<-rbind(expand.grid(c(mydf[1,3:4]),c(mydf[2,3:4])),
expand.grid(c(mydf[2,3:4]), c(mydf [1,3:4])))
newmydf<-cbind(mydf, correct = paste(mydf$A,mydf$B)%in%paste(dum.match$Var1,
dum.match$Var2))
So I generated a function:
err.chk <- function (x) {
dum.match<-rbind(expand.grid(c(x[1,3:4]),c(x[2,3:4])),
expand.grid(c(x[2,3:4]),c(x[1,3:4])))
newmydf<-cbind(x, correct = paste(x$A,mydf$B)%in%paste(dum.match$Var1,
dum.match$Var2))
return (newmydf)
}
Now I want to create seperate 3 dataset for each level of family and apply the above function and combine the results into above dataframe with additional column correct. How can I do it ? I tried following (and results are awaful !)
require(plyr)
aaply(mydf, 1, err.chk)
Edit:
Expected output:
family name A B correct
1 1 P1 1 2 FALSE
2 1 P2 3 4 FALSE
3 1 IndA 1 3 TRUE
4 1 IndB 2 4 TRUE
5 1 IndC 2 2 FALSE
6 1 IndD 5 2 FALSE
7 1 IndE 5 6 FALSE
8 1 IndF 1 2 FALSE
9 1 IndG 4 2 TRUE
10 2 P1 1 1 FALSE
11 2 P2 3 4 FALSE
12 2 IndH 3 3 FALSE
13 2 IndI 1 1 FALSE
14 2 IndJ 4 1 TRUE
15 2 IndK 3 5 FALSE
16 3 P1 1 2 TRUE
17 3 P2 1 2 TRUE
18 3 IndL 3 1 FALSE
19 3 IndM 2 2 TRUE
20 3 IndN 1 1 TRUE
Just for family = 3 (similaly for other datasets)
# just data for family 3
name <- c("P1", "P2", "IndL", "IndM", "IndN")
A <- c(1, 1, 3,2,1 )
B <- c(2,2, 1, 2, 1)
mydf <- data.frame (name, A, B)
err.chk(fam3)
name A B correct
16 P1 1 2 TRUE
17 P2 1 2 TRUE
18 IndL 3 1 FALSE
19 IndM 2 2 TRUE
20 IndN 1 1 TRUE
Its hard to follow exactly what you’re doing, but with
plyryou want to use a**plyfunction that accepts the data type you’re giving it and returns the data type your function returns. In this case,ddplylooks like the right choice.If you fix your function in the 3rd line you have a
mydf$Bwhich should bex$B:Calling it using
ddplygives a reasonable looking result.