With the help of flodel I found a way to replace numeric codes with value labels from a lookup table.
Ambitious as I am, I now want to put that into a function. Also, I have a lot of lookup tables I need to swoop onto my data so a function would be handy.
But first some sample data, starting with a data fram,
df <- data.frame(id = c(1:6),
profession = c(1, 5, 4, NA, 0, 5))
df
# id profession
# 1 1
# 2 5
# 3 4
# 4 NA
# 5 0
# 6 5
and a lookup table with human readable information about the profession codes,
profession.lookuptable <- c(Optometrists=1, Accountants=2, Veterinarians=3,
`Financial analysts`=4, Nurses=5)
flodel showed me how replace numeric codes with value labels from a lookup table. Like this,
match.idx <- match(df$profession, profession.lookuptable)
df$profession <- ifelse(is.na(match.idx),
df$profession, names(profession.lookuptable)[match.idx])
df
# id profession
# 1 Optometrists
# 2 Nurses
# 3 Financial analysts
# 4 <NA>
# 5 0
# 6 Nurses
I now want to put this into a function where I can state the data frame df and the name of the variable profession and have the function take care of the rest.
I define my function like this,
ADDlookup <- function(orginalDF, orginalVAR) {
DF.VAR <- paste(orginalDF, "$", orginalVAR, sep="")
lookup.table <- paste(orginalVAR, ".lookuptable")
match.idx <- match(DF.VAR, lookup.table)
DF.VAR <- ifelse(is.na(match.idx), DF.VAR, names(lookup.table)[match.idx])
}
but apparently that is not working
ADDlookup(df, profession)
I get the errorer messes
Error in paste(orginalDF, "$", orginalVAR, sep = "") :
object 'profession' not found
Now, this is where I get stuck.
Can anyone please tell what manual page I need to read or maybe a friendly hint on how to solve this?
Thank you for reading.
It’s because you’re passing
professioninto theADDlookupfunction, but it doesn’t exist yet.The way you’ve written your function, you have to distinguish between using the character vector containing the name of the variable, and the variable itself.
For example, your first few lines
paste(originalDF,'$',originalVAR,sep='')etc appear to expectoriginalDFandoriginalVARto be strings, and you’ll haveDF.VARbeing the string'df$profession'. However, when you domatchit looks like you wantDF.VARto be the variabledf$profession.This is how I suggest you get around it:
– pass in
originalDFas an object, beingdf– pass in
originalVARas a string, being'profession'(it’s a column name and hence a string)Then, retrieve the column contained in
originalVarfrom the data frame via:Now your next line where you look for the object
profession.lookuptableis a little trickier: you construct the string'profession.lookuptable', and then you want to look up the object that has that name.For this, you can use
get(?get).get('df')will return thedfdata frame:This will retrieve the object called
'profession.lookuptable'. It follows the same rules as if you’d typedprofession.lookuptabledirectly, so you have to make sure that the function can “see” that object (in your case you should be fine).Next, it looks like you want to return the
originalDFdata frame where theoriginalVARcolumn has been substituted with the lookup values.I’ll just modify the
originalDF[,originalVAR]column by replacing it with the lookup values:NOTE that we are not actually modifying the
dfdata frame that you passed in as an argument toADDlookup; R makes a copy of the data frame within the function. So, your originaldfis preserved.Finally, we have to return the data frame:
All together now:
And now to test it:
Note that the original
dfis unmodified; in general R functions do not modify the parameters that are passed in to them.As another improvement — it is generally a bit dangerous to rely on the
professions.lookuptable having been created before you call theADDlookupfunction.Instead of the whole
lookup.table <- get( 'profession.lookup' )shebang (which, depending on if you have multiple ‘profession.lookup’ tables in various scopes), I’d strongly recommend you just pass in the lookup table as a parameter:Then you can avoid that entire
get(xxxx)line (and all associated scoping problems that go with it).Then you’d call the function via: