This is a followup question to this question, initially inspired by this question, but not quite the same.
This is my situation. First I pull some data from a database,
df <- data.frame(id = c(1:6),
profession = c(1, 5, 4, NA, 0, 5))
df
# id profession
# 1 1
# 2 5
# 3 4
# 4 NA
# 5 0
# 6 5
Second, I pull a key-table with human readable information about the profession codes,
profession.codes <- data.frame(profession.code = c(1,2,3,4,5),
profession.label = c('Optometrists',
'Accountants', 'Veterinarians',
'Financial analysts', 'Nurses'))
profession.codes
# profession.code profession.label
# 1 Optometrists
# 2 Accountants
# 3 Veterinarians
# 4 Financial analysts
# 5 Nurses
Now, I would like to overwrite the profession variable in my df with the labels from profession.codes, preferably using join from the plyr package, but I’m open to any smart solution. Though I do like that ply preserves the order of x.
I currently do it like this,
# install.packages('plyr', dependencies = TRUE)
library(plyr)
profession.codes$profession <- profession.codes$profession.code
df <- join(df, profession.codes, by="profession")
# levels(df$profession.label)
df$profession.label <- factor(df$profession.label,
levels = c(levels(df$profession.label),
setdiff(df$profession, df$profession.code)))
# levels(df$profession.label)
df$profession.label[df$profession==0 ] <- 0
df$profession.code <- NULL
df$profession <- NULL
names(df) <- c("id", "profession")
df
# id profession
# 1 Optometrists
# 2 Nurses
# 3 Financial analysts
# 4 <NA>
# 5 0
# 6 Nurses
This is how I overwrite profession without losing the NA and the 0.
The problem is that the 0 could be a 17 or any number and I would like to account for that in some way. Furthermore, I would also like to shorten my code, if possible.
Any help would be greatly appreciated.
Thanks,
Eric
This is one approach in base:
Which yields: