I need to make tutorial for beginner using the R *apply function (without using reshape or plyr package in a first time)
I try to lapply (because i read apply is not good for dataframe) a simple function to this dataframe, and i want to use named column to access data :
fDist <- function(x1,x2,y1,y2) {
return (0.1*((x1 - x2)^2 + (y1-y2)^2)^0.5)
}
data <- read.table(textConnection("X1 Y1 X2 Y2
1 3.5 2.1 4.1 2.9
2 3.1 1.2 0.8 4.3
"))
data$dist <- lapply(data,function(df) {fDist(df$X1 , df$X2 , df$Y1 , df$Y2)})
I have this error $ operator is invalid for atomic vectors, it is probably because the dataframe is modified by laply ?… is there a best way to do that with $ named column?
I resolve my first question with @DWin answer. But i have another problem, misunderstanding, with mixed dataframe (numeric + character) :
In my new use case, i use two function to compute distance, because my objective is to compare a distance Point between all of other Point.
data2 <- read.table(textConnection("X1 Y1 X2 Y2
1 3.5 2.1 4.1 2.9
2 3.1 1.2 0.8 4.3
"))
data2$char <- c("a","b")
fDist <- function(x1,y1,x2,y2) {
return (0.1*((x1 - x2)^2 + (y1-y2)^2)^0.5)
}
fDist2 <- function(fixedX,fixedY,vec) {
fDist(fixedX,fixedY,vec[['X2']],vec[['Y2']])
}
# works with data (dataframe without character), but not with data2 (dataframe with character)
#ok
data$f_dist <- apply(data, 1, function(df) {fDist2(data[1,]$X1,data[1,]$Y1,df)})
#not ok
data2$f_dist <- apply(data2, 1, function(df) {fDist2(data2[1,]$X1,data2[1,]$Y1,df)})
In this case
applyis what you need. All of the data columns are of the same type and you don’t have any worries about loosing attributes, which is where apply causes problems. You will need to write your function differently so it just takes one vector of length 4:If you wanted to use the names of the columns in ‘data’ then they need to be spelled correctly:
Your updated (and very different) question is easy to resolve. When you use
applyit coerces to the lowest common mode denominator, in this case ‘character’. You have two choices: either 1) addas.numericto all of your arguments inside the functions, or 2) only send the columns that are needed which I will illustrate:I really do not like how you are passing parameters to this function. Using “[” and “$” within the formals list “just looks wrong.” And you should know that “df” will not be a dataframe, but rather a vector. Because it’s not a dataframe (or a list) you should alter the function inside so that it uses “[” rather than “[[“. Since you only want two of the coordinates, then only pass the two (numeric) ones that you would be using.