I am trying to clean up some data that has been incorrectly entered. The question for the variable allows for multiple responses out of five choices, numbered as 1 to 5. The data has been entered in the following manner (this is just an example–there are many more variables and many more observations in the actual data frame):
data
V1
1 1, 2, 3
2 1, 2, 4
3 2, 3, 4, 5
4 1, 3, 4
5 1, 3, 5
6 2, 3, 4, 5
Here’s some code to recreate that example data:
data = data.frame(V1 = c("1, 2, 3", "1, 2, 4", "2, 3, 4, 5",
"1, 3, 4", "1, 3, 5", "2, 3, 4, 5"))
What I actually need is the data to be treated more… binary–like a set of “yes/no” questions–entered in a data frame that looks more like:
data
V1.1 V1.2 V1.3 V1.4 V1.5
1 1 1 1 NA NA
2 1 1 NA 1 NA
3 NA 1 1 1 1
4 1 NA 1 1 NA
5 1 NA 1 NA 1
6 NA 1 1 1 1
The actual variable names don’t matter at the moment–I can easily fix that. Also, it doesn’t matter too much whether the missing elements are “O”, “NA”, or blank–again, that’s something I can fix later.
I’ve tried using the transform function from the reshape package as well as a fed different things with strsplit, but I can’t get either to do what I am looking for.
I’ve also looked at many other related questions on Stackoverflow, but they don’t seem to be quite the same problem.
You just need to write a function and use
apply. First some dummy data:Next, create a function that takes in a row and transforms as necessary
Then use
applyand transpose the result