I’m trying to create a dummy variable for “good” and “bad” by extracting numbers from the HOUSE column. What I want to do is, the house is “good” if the value in the column HOUSE is 1,2,9 and otherwise “bad”)
I am pasting the dput output of my data.frame object.
## dput output assigned to the housetype variable
structure(list(Price = c(10L, 20L, 31L, 41L, 52L, 63L, 45L, 63L,
64L, 45L), Location = structure(c(4L, 7L, 6L, 3L, 2L, 4L, 5L,
1L, 6L, 8L), .Label = c("AK", "ATL", "BOS", "DC", "GA", "MA",
"NYC", "PA"), class = "factor"), HOUSE = c(1L, 1L, 1L, 2L, 6L,
7L, 8L, 9L, 10L, 11L)), .Names = c("Price", "Location", "HOUSE"
), class = "data.frame", row.names = c(NA, -10L))
How can I create a dummy variable in a way that each variable still contains the other information? (price and location)
Thanks!!!
You can simply do:
Instead of creating a vector of characters (“good” or “bad”), it is good practice to create a flag variable, i.e. a vector of type logical (TRUE or FALSE). It uses less memory and is in general easier to work with: