I have a data.frame like this:
(t=structure(list(count = c(NA, 2, NA, NA, NA, 8, NA, NA, NA)), .Names = "count", row.names = c(NA,-9L), class = "data.frame"))
count
1 NA
2 2
3 NA
4 NA
5 NA
6 8
7 NA
8 NA
9 NA
It is great that R has the NA value but sometimes it bites me. I often forget about it and try to do subsetting like this
> t[t$count>=1,]
[1] NA 2 NA NA NA 8 NA NA NA
And the output includes all NA rows. (which I don’t like)
After an hour of bug searching I change the code to this and that is what I want (imagine large dataframe a lots of non-NA resuls and only few “well-hidden” NAs):
> t[t$count>=1&!is.na(t$count),]
[1] 2 8
1.
Is there a feature of the “as.integer” function so that I could do something like this:
t[as.integer.EXCLUDE.NA(t$count)>=1,]
I would want to use such feature in other as.xxxx functions as well. Basically force R to stop think like a statistician and treat NA differently (e.g., like NULL (I am not sure NULL would solve my issue) (this did not work: t$count[3]<-NULL for some reason)
2.
or how would I run
transform(t, replace all NAs from count columns with 0)
or even better
transform(t, replace all NA from all numeric columns with 0 in t)
3.
any generic comments on making R forget about NAs are welcomed?
I do not like the choices that were made at the time of designing how “[” handles NA values either. The approach I take when I want to extract values using a logical test is to wrap the logical expression in
which. This converts the result to a set of numbers and indexing succeeds without dragging along the unwanted NA’s:I also use
subsetsince it handles NA’s in the same manner aswhich(logical). The one gotcha is whenwhichis used with a “-” sign to retrieve the complement set. If there are no elements in the set satisfying the logical-condition, there will also be no elements in the-which(logical)-form. So I just do not use the-whichcombo: