I have a dataframe with some numeric columns. Some row has a 0 value which should be considered as null in statistical analysis. What is the fastest way to replace all the 0 value to NULL in R?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Replacing all zeroes to NA:
Explanation
1. It is not
NULLwhat you should want to replace zeroes with. As it says in?'NULL',which is unique and, I guess, can be seen as the most uninformative and empty object.1 Then it becomes not so surprising that
That is, R does not reserve any space for this null object.2 Meanwhile, looking at
?'NA'we see thatImportantly,
NAis of length 1 so that R reserves some space for it. E.g.,Also, the data frame structure requires all the columns to have the same number of elements so that there can be no “holes” (i.e.,
NULLvalues).Now you could replace zeroes by
NULLin a data frame in the sense of completely removing all the rows containing at least one zero. When using, e.g.,var,cov, orcor, that is actually equivalent to first replacing zeroes withNAand setting the value ofuseas"complete.obs". Typically, however, this is unsatisfactory as it leads to extra information loss.2. Instead of running some sort of loop, in the solution I use
df == 0vectorization.df == 0returns (try it) a matrix of the same size asdf, with the entriesTRUEandFALSE. Further, we are also allowed to pass this matrix to the subsetting[...](see?'['). Lastly, while the result ofdf[df == 0]is perfectly intuitive, it may seem strange thatdf[df == 0] <- NAgives the desired effect. The assignment operator<-is indeed not always so smart and does not work in this way with some other objects, but it does so with data frames; see?'<-'.1 The empty set in the set theory feels somehow related.
2 Another similarity with the set theory: the empty set is a subset of every set, but we do not reserve any space for it.