I’ve been having this strange problem with apply lately. Consider the following example:
set.seed(42)
df <- data.frame(cars, foo = sample(LETTERS[1:5], size = nrow(cars), replace = TRUE))
head(df)
speed dist foo
1 4 2 E
2 4 10 E
3 7 4 B
4 7 22 E
5 8 16 D
6 9 10 C
I want to use apply to apply a function fun (say, mean) on each column of that data.frame. If the data.frame is containing only numeric values, I do not have any problem:
apply(cars, 2, mean)
speed dist
15.40 42.98
But when trying with my data.frame containing numeric and character data, it seem to fail:
apply(df, 2, mean)
speed dist foo
NA NA NA
Warning messages:
1: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ..) :
argument is not numeric or logical: returning NA
3: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
Of course, I was expecting to get NA for the character column, but I would like to get values for the numeric columns anyway.
sapply(df, class)
speed dist foo
"numeric" "numeric" "factor"
Any pointers would be appreciated as I’m feeling like I’m missing something very obvious here!
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
The first sentence of the description for
?applysays:Matrices can only be of a single type in R. When the data frame is coerced to a matrix, everything ends up as a character if there is even a single character column.
I guess I owe you an description of an alternative, so here you go. data frames are really just lists, so if you want to apply a function to each column, use
lapplyorsapplyinstead.