I’m trying to wrap my head around closures, and I think I’ve found a case where they might be helpful.
I have the following pieces to work with:
- A set of regular expressions designed to clean state names, housed in a function
- A data.frame with state names (of the standardized form that the function above creates) and state ID codes, to link the two (the “merge map”)
The idea is, given some data.frame with sloppy state names (is the capital listed as “Washington, D.C.”, “washington DC”, “District of Columbia”, etc.?), to have a single function return the same data.frame with the state name column removed and only the state ID codes remaining. Then subsequent merges can happen consistently.
I can do this in any number of ways, but one way that seems to be particularly elegant would be to house the merge map and the regular expression and the code process everything inside a closure (following the idea that a closure is a function with data).
Question 1: Is this a reasonable idea?
Question 2: If so, how do I do it in R?
Here’s a stupid simple clean state names function that works on the example data:
cleanStateNames <- function(x) {
x <- tolower(x)
x[grepl("columbia",x)] <- "DC"
x
}
Here’s some example data that the eventual function will be run on:
dat <- structure(list(state = c("Alabama", "Alaska", "Arizona", "Arkansas",
"California", "Colorado", "Connecticut", "Delaware", "District of Columbia",
"Florida"), pop08 = structure(c(29L, 44L, 40L, 18L, 25L, 30L,
22L, 48L, 36L, 13L), .Label = c("1,050,788", "1,288,198", "1,315,809",
"1,316,456", "1,523,816", "1,783,432", "1,814,468", "1,984,356",
"10,003,422", "11,485,910", "12,448,279", "12,901,563", "18,328,340",
"19,490,297", "2,600,167", "2,736,424", "2,802,134", "2,855,390",
"2,938,618", "24,326,974", "3,002,555", "3,501,252", "3,642,361",
"3,790,060", "36,756,666", "4,269,245", "4,410,796", "4,479,800",
"4,661,900", "4,939,456", "5,220,393", "5,627,967", "5,633,597",
"5,911,605", "532,668", "591,833", "6,214,888", "6,376,792",
"6,497,967", "6,500,180", "6,549,224", "621,270", "641,481",
"686,293", "7,769,089", "8,682,661", "804,194", "873,092", "9,222,414",
"9,685,744", "967,440"), class = "factor")), .Names = c("state",
"pop08"), row.names = c(NA, 10L), class = "data.frame")
And a sample merge map (the actual one links FIPS codes to states, so it can’t be trivially generated):
merge_map <- data.frame(state=dat$state, id=seq(10) )
EDIT Building off of crippledlambda’s answer below, here’s an attempt at the function:
prepForMerge <- local({
merge_map <- structure(list(state = c("alabama", "alaska", "arizona", "arkansas", "california", "colorado", "connecticut", "delaware", "DC", "florida" ), id = 1:10), .Names = c("state", "id"), row.names = c(NA, -10L ), class = "data.frame")
list(
replace_merge_map=function(new_merge_map) {
merge_map <<- new_merge_map
},
show_merge_map=function() {
merge_map
},
return_prepped_data.frame=function(dat) {
dat$state <- cleanStateNames(dat$state)
dat <- merge(dat,merge_map)
dat <- subset(dat,select=c(-state))
dat
}
)
})
> prepForMerge$return_prepped_data.frame(dat)
pop08 id
1 4,661,900 1
2 686,293 2
3 6,500,180 3
4 2,855,390 4
5 36,756,666 5
6 4,939,456 6
7 3,501,252 7
8 591,833 9
9 873,092 8
10 18,328,340 10
Two problems remain before I’d consider this question solved:
-
Calling
prepForMerge$return_prepped_data.frame(dat)is painful each time. Any way to have a default function such that I could just call prepForMerge(dat)? I’m guessing not given how it’s implemented, but perhaps there’s at least a convention for the default fxn…. -
How do I avoid mixing the data and code in the merge_map definition? Ideally I’d clean merge_map elsewhere, then just grab it inside the closure and store that.
I may be missing the point of your question, but this is one way in which you can use a closure:
But to generalize, you can replace
statenameswith your data frame definition, and return a function (or list of functions) which uses this data frame without having to pass it as an argument to the function call. Example (but note I’ve used theignore.case=TRUEargument ingrepl):Just like the first example:
Just returns the lexically-scoped value of
statenamesto check that the original values are unchanged:Do the same thing, but make the change “permanent”:
And note that the value of
statenamesattached to these functions has changed.In any case, you can replace
statenameswith a data frame, and these simple functions with a “merge map” or any other mapping you desire.Edit
Speaking of “merge”, is this what you’re looking for? An implementation of first
?mergeexample using a closure:Edit 2
To read file into an object which will be lexically bound, you can either do
or
or
and so on. (I assume the function(…) will have to do with your “merge_map”). You can also use
evalqin place oflocal. To “bring in” objects residing in the global space (or enclosing environment), you can just do the followingthen modifying
globalobjlater will not changelocalobjattached to the function (since almost(?) everything in R follows pass-by-value semantics). You can also usewithinstead oflocalas shown in examples above.