Every time I think I understand about working with vectors, what appears to be a simple problem turns my head inside out. Lot’s of reading and trying different examples hasn’t helped on this occasion. Please spoon feed me here…
I want to apply two custom functions to each row of a dataframe and add the results as a two new columns. Here is my sample code:
# Required packages:
library(plyr)
FindMFE <- function(x) {
MFE <- max(x, na.rm = TRUE)
MFE <- ifelse(is.infinite(MFE ) | (MFE < 0), 0, MFE)
return(MFE)
}
FindMAE <- function(x) {
MAE <- min(x, na.rm = TRUE)
MAE <- ifelse(is.infinite(MAE) | (MAE> 0), 0, MAE)
return(MAE)
}
FindMAEandMFE <- function(x){
# I know this next line is wrong...
z <- apply(x, 1, FindMFE, FindMFE)
return(z)
}
df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))
df1 = transform(df1,
FindMAEandMFE(df1)
)
#DF1 should end up with the following data...
#Bar1 Bar2 MFE MAE
#1 3 3 0
#2 1 2 0
#3 3 3 0
#-3 -2 0 -3
#-2 -3 0 -3
#-1 -1 0 -1
It would be great to get an answer using the plyr library and a more base like approach. Both will aid in my understanding. Of course, please point out where I’m going wrong if it’s obvious. 😉
Now back to the help files for me!
Edit: I would like a multivariate solution as column names may change and expand over time. It also allows re-use of the code in future.
I think you are thinking too complex here. What is wrong with two separate
apply()calls? There is however a far better way to do what you are doing here that involves no looping/apply calls. I’ll deal with these separately, but the second solution is preferable as it is truly vectorised.Two apply calls version
First two separate apply calls using all-Base R functions:
Which gives:
Ok, looping over the rows of
df1twice is perhaps a little inefficient, but even for big problems you’ve spent more time already thinking about doing this cleverly in a single pass than you will save by doing that way.Using vectorised functions
pmax()andpmin()So a better way of doing this is to note the
pmax()andpmin()functions and realise that they can do what each theapply(df1, 1, FindFOO()calls were doing. For example:would be MFE from your Question. This is very simple to work with if you have two columns and they are
Bar1andBar2or the first 2 columns ofdf1, always. But it is not very general; what if you have multiple columns you want to compute this over etc?pmax(df1[, 1:2], na.rm = TRUE)won’t do what we want:The trick to getting a general solution using
pmax()andpmin()is to usedo.call()to arrange the calls to those two functions for us. Updating your functions to use this idea we have:which give:
and not an
apply()in sight. If you want to do this in a single step, this is now much easier to wrap:which can be used as: