I am scoring a psychometric instrument at work and want to recode a few variables. Basically, each question has five possible responses, worth 0 to 4 respectively. That is how they were coded into our database, so I don’t need to do anything except sum those. However, there are three questions that have reversed scores (so, when someone answers 0, we score that as 4). Thus, I am “reversing” those ones.
The data frame basically looks like this:
studyid timepoint date inst_q01 inst_q02 ... inst_q20
1 2 1995-03-13 0 2 ... 4
2 2 1995-06-15 1 3 ... 4
Here’s what I’ve done so far.
# Survey Processing
# Find missing values (-9) and confusions (-1), and sum them
project_f03$inst_nmiss <- rowSums(project_f03[,4:23]==-9)
project_f03$inst_nconfuse <- rowSums(project_f03[,4:23]==-1)
project_f03$inst_nmisstot <- project_f03$inst_nmiss + project_f03$inst_nconfuse
# Recode any missing values into NAs
for(x in 4:23) {project_f03[project_f03[,x]==-9 | project_f03[,x]==-1,x] <- NA}
rm(x)
Now, everything so far is pretty fine, I am about to recode the three reversed ones. Now, my initial thought was to do a simple loop through the three variables, and do a series of assignment statements something like below:
# Questions 3, 11, and 16 are reversed
for(x in c(3,11,16)+3) {
project_f03[project_f03[,x]==4,x] <- 5
project_f03[project_f03[,x]==3,x] <- 6
project_f03[project_f03[,x]==2,x] <- 7
project_f03[project_f03[,x]==1,x] <- 8
project_f03[project_f03[,x]==0,x] <- 9
project_f03[,x] <- project_f03[,x]-5
}
rm(x)
So, the five assignment statements just reassign new values, and the loop just takes it through all three of the variables in question. Since I was reversing the scale, I thought it was easiest to offset everything by 5 and then just subtract five after all recodes were done. The main issue, though, is that there are NAs and those NAs result in errors in the loop (naturally, NA==4 returns an NA in R). Duh – forgot a basic rule!
I’ve come up with three alternatives, but I’m not sure which is the best.
- First, I could obviously just move the NA-creating code after the loop, and it should work fine. Pros: easiest to implement. Cons: Only works if I am receiving data with no innate (versus created) NAs.
- Second, I could change the logic statement to be something like:
project_f03[!is.na(project_f03[,x]) && project_f03[,x]==4,x]which should eliminate the logic conflict. Pros: not too hard, I know it works. Cons: A lot of extra code, seems like a kludge. - Finally, I could change the logic from
project_f03[project_f03[,x]==4,x] <- 5to
project_f03[project_f03[,x] %in% 4,x] <- 5. This seems to work fine, but I’m not sure if it’s a good practice, and wanted to get thoughts. Pros: quick fix for this issue and seems to work; preserves general syntatic flow of “blah blah LOGIC blah <- bleh”. Cons: Might create black hole? Not sure what the potential implications of using%in%like this might be.
EDITED TO MAKE CLEAR
This question has one primary component: Is it safe to utilize %in% as described in the third point above when doing logical operations, or are there reasons not to do so?
The second component is: What are recommended ways of reversing the values, like some have described in answers and comments?
The straightforward answer is that there is no black hole to using
%in%. But in instances where I want to just discard theNAvalues, I’d usewhich:project_f03[which(project_f03[,x]==4),x] <- 5%in%could shorten that earlier bit of code you had:Like @flodel suggested, you can replace that whole block of code in your for-loop with
project_f03[,x] <- rev(0:4)[match(project_f03[,x], 0:4, nomatch=10)]. It should preserveNA. And there are probably more opportunities to simplify code.