I was using sum(is.na(my.df)) to check whether my data frame contained any NAs, which worked as I expected, but sum(is.nan(my.df)) did not work as I expected.
> my.df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN))
> my.df
a b
1 1 5
2 2 NA
3 3 NaN
> is.na(my.df)
a b
[1,] FALSE FALSE
[2,] FALSE TRUE
[3,] FALSE TRUE
> is.nan(my.df)
a b
FALSE FALSE
> sum(is.na(my.df))
[1] 2
> sum(is.nan(my.df))
[1] 0
Oh dear.
Is there a reason for the inconsistency in behaviour? Is it for a lack of implementation, or is it intentional? What does the return value of is.nan(my.df) signify? Is there a good reason not to use is.nan() on a whole data frame?
In the documentation for is.na( ) and is.nan( ), the argument types seem the same (although they don’t specifically list data frames):
is.na(): x R object to be tested: the default methods handle atomic vectors, lists and pairlists.
is.nan(): x R object to be tested: the default methods handle atomic vectors, lists and pairlists.
From
?is.nan:The columns of a data frame are technically “elements of a list”, so
is.nan(df)returns a vector with length equal to the number of columns of the data frame, which isTRUEonly if the column consists of a singleNaNelement:If you want behavior matching that of
is.na, useapply:The answer is 1 rather than 2 because
is.nan(NA)isFALSE…edit: alternatively, you can just turn the data frame into a matrix:
update: this behaviour changed shortly (two months) after the question was asked, in R version 2.14 (October 2011): from the NEWS file,