This is a follow-up question related to my previous post. Below is a more explanatory version of “what I want to do” as opposed to “how do I make this method work”.
Below is code that produces a “master” database, from which, I extract elements for further use in other functions. I routinely extract elements of data based on the value of a group identification number.
- Objective: I would like to be able to “wrap” the specifications that vary (like the name of the output dataframe and the groups selected) into a function that could be called.
##### generating data for example
set.seed(271828)
n.elements <- c(10,10,12,14,16,18)
group.number <- rep(1001:1006, n.elements)
element.id <- c(
seq(1,n.elements[1], 1),
seq(1,n.elements[2], 1),
seq(1,n.elements[3], 1),
seq(1,n.elements[4], 1),
seq(1,n.elements[5], 1),
seq(1,n.elements[6], 1) )
x1 <- round(rnorm(length(group.number),45, 12), digits=0)
x2 <- round(rbeta(length(group.number),2,4), digits = 2)
data.base <- as.data.frame(cbind(group.number, element.id, x1, x2))
data.base
##### data.base is representative of the large database
##### suppose I need to pull a set together made up of groups:
##### 1003, 1004, and 1001
groups.set.1 <- as.data.frame(c(1003, 1004, 1001))
bank.names <- c("group.number")
colnames(groups.set.1) <- bank.names
set.sort <- matrix(seq(1,nrow(groups.set.1),1))
sort.set.1 <- cbind(groups.set.1, set.sort)
set.1 <- as.data.frame(merge(sort.set.1, data.base,
by="group.number", all.x=TRUE))
##### this is how the dataset needs to be ordered for further use
set.1 <- set.1[order(set.1$set.sort, set.1$element.id ), ]
row.names(set.1) <- seq(nrow(set.1))
EDIT: Suppose I wanted to carry out the same task to produce set.2, where set.2 is made up of groups: 1005, 1006, and 1002. I could just copy the above code, and make the relevant changes. However, I would like to know if it is possible to specify a function so that I can pass the necessary changes to it, and have it produce the output dataframe as desired. Perhaps having a function called group.extract, where I could specify something like the following:
groups.2 <- c(1005, 1006, 1002)
group.extract(set.2, groups.2)
Based on the comments provided, it seems like a list is the way to go, and have the function call the list, where the list elements can vary.
I’d write this function using
match, as follows. Here I’ve hard-coded the names of the columns of the input data frame to use for matching and sorting; those could also be added as optional inputs. The column order of the output is slightly different from yours but that could be easily changed as well.You’d use it almost exactly like you propose:
Though if you have multiple groups to get, putting them in a list and using
lapplywould be the way to go.