I often find myself having to perform repetitive tasks in R. It gets extremely frustrating having to constantly run the same function on one or more data structures over and over again.
For example, let’s say I have three separate data frames in R, and I want to delete the rows in each data frame which possess a missing value. With three data frames, it’s not all that difficult to run na.omit() on each of the df’s, but it can get extremely inefficient
when one has one hundred similar data structures which require the same action.
df1 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2004,2004,2004,2004,2004,2004), value=c(35,20,20,50,30,NA))
df2 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2005,2005,2005,2005,2005,2005), value=c(55,350,40,90,99,NA))
df3 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2006,2006,2006,2006,2006,2006), value=c(300,200,200,500,300,NA))
tot04 <- na.omit(df1)
tot05 <- na.omit(df2)
tot06 <- na.omit(df3)
What are some general guidelines for dealing with repetitive tasks in R?
Yes, I recognise that the answer to this question is specific to the task that one faces, but I’m just asking about general things that a user should consider when they have a repetitive task.
As a general guideline, if you have several objects that you want to apply the same operations to, you should collect them into one data structure. Then you can use loops, [sl]apply, etc to do the operations in one go. In this case, instead of having separate data frames
df1,df2, etc, you could put them into a list of data frames and then runna.omiton all of them: