I know my problem is simple but not for me. Here is small dataset.
mark1 <- c("AB", "BB", "AB", "BB", "BB", "AB", "--", "BB")
mark2 <- c("AB", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark3 <- c("BB", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark4 <- c("AA", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark5 <- c("AB", "AB", "AA", "BB", "BB", "AA", "--", "BB")
mark6 <- c("--", "BB", "AA", "BB", "BB", "AA", "--", "BB")
mark7 <- c("AB", "--", "AA", "BB", "BB", "AA", "--", "BB")
mark8 <- c("BB", "AA", "AA", "BB", "BB", "AA", "--", "BB")
mymark <- data.frame (mark1, mark2, mark3, mark4, mark5, mark6, mark7, mark8)
tmymark <- data.frame (t(mymark))
names (tmymark) <- c("P1", "P2","I1", "I2", "I3", "I4", "KL", "MN")
Thus dataset becomes:
P1 P2 I1 I2 I3 I4 KL MN
mark1 AB BB AB BB BB AB -- BB
mark2 AB AB AA BB BB AA -- BB
mark3 BB AB AA BB BB AA -- BB
mark4 AA AB AA BB BB AA -- BB
mark5 AB AB AA BB BB AA -- BB
mark6 -- BB AA BB BB AA -- BB
mark7 AB -- AA BB BB AA -- BB
mark8 BB AA AA BB BB AA -- BB
I want to classify mark1:8 based on the P1 and P2 comparision and provide a code, which will make a new variable:
loctype <- NULL
if (tmymark$P1 == "AB" & tmymark$P2 == "AB"){
loctype = "<hkxhk>"
} else {
if (tmymark$P1== "AB" & tmymark$P2 == "BB") {
loctype = "<lmxll>"
} else {
if (tmymark$P1 == "AA" & tmymark$P2 == "AB") {
loctype = "<nnxnp>"
} else {
if (tmymark$P1 == "AA" & tmymark$P2 == "BB") {
loctype = "MN"
} else {
if (tmymark$P1 == "BB" & tmymark$P2 == "AA"){
loctype = "MN"
} else {
if (tmymark$P1 == "--" & tmymark$P2 == "AA"){
loctype = "NR"
} else {
if (tmymark$P1 == "AA" & tmymark$P2 == "--"){
loctype = "NR"
} else {
cat ("error wrong input in P1 or P2")
}} }}}}}
Here what I am trying to do it compare P1 and P2 values and generated a new variable.
for examp, if tmymark$P1 == “AB” & tmymark$P2 == “AB” the loctype should be “”. If not the second condition will be application and so on.
Here is my error message.
Warning messages:
1: In if (tmymark$P1 == "AB" & tmymark$P2 == "AB") { :
the condition has length > 1 and only the first element will be used
2: In if (tmymark$P1 == "AB" & tmymark$P2 == "BB") { :
the condition has length > 1 and only the first element will be used
Once loctype vector is generated I want to recode the tmymark with the information in this variable:
tmymark1 <- data.frame (loctype, tmymark)
require(car)
for(i in 2:length(tmymark)){
if (loctype = "<hkxhk>") {
tmymark[[i]] <- recode (x, "AB" = "hk", "BA" = "hk", "AA" = "hh", "BB" = "kk")
} else {
if (loctype = "<lmxll>") {
tmymark[[i]] <- recode ((x, "AB" = "lm", "BA" = "lm", "AA" = "--", "BB" = "kk")
} else {
if (loctype = "<nnxnp>") {
tmymark[[i]] <- recode ((x, "AB" = "np", "BA" = "np", "AA" = "nn", "BB" = "--")
} else {
if (loctype = "MN") {
tmymark[[i]] <- "--"
} esle {
if (loctype = "NR") {
tmymark[[i]] <- "NA"
} else {
cat ("error wrong input code")
} } }}}
Am I on right track ?
Edits: Expected output
loctype P1 P2 I1 I2 I3 I4 KL MN
mark1 <lmxmm> lm mm lm mm mm lm -- mm
mark2 <hkxhk> hk hk hh kk kk hh -- kk
mark3 <nnxnp> nn np nn -- -- nn -- --
and so on
matchis definitely the way to go. I’d make two data frames as keys, like this:and then use
matchonkey1to get theloctype(as Justin also recommends), and also on both the rownames and columns ofkey2to get the desired substitution, using matrix indexing to get the desired value from the key.The result then looks like this, where the missing values are because I don’t have values for those combinations in my keys.