I have a data frame with a large number of variables. I am creating new variables by adding together some of the old ones. The code I am using to do so is:
name_of_data_frame<- transform(name_of_data_frame, new_variable=var1+var2 +....)
When transform comes across a NA in one of the observations, it returns “NA” in the new variable, even if some of the other variables it was adding were not NA.
e.g. if var1= 4, var2=3, var3=NA, then using transform, if I did var1+var2+var3 it would give out NA, whereas I would like it to give me 7.
I don’t want to recode my NAs to zero within the data frame, as I may need to refer back to the NAs later, so don’t want to confuse the NAs with the observations which were genuinely 0.
Any help on how to get around R treating NAs in the way described above with the transform function would be great (or if there are alternative functions to use, that would be great also).
Please note that I am not always just summing variables that are next to each other, I am also often dividing variables, multiplying, subtracting etc.
My first instinct was to suggest to use
sum()since then you can use thena.rmargument. However, this doesn’t work, sincesum()reduces it arguments to a single scalar value, not a vector.This means you need to write a
parallel sumfunction. Let’s call thispsum(), similar to the base R functionpmin()orpmax():Now set up some data and use
psum()to get the desired vector:Similarly, you can define a
parallel product, orpprod()like this:This example of
pprodprovides a general template for what you want to do: Create a function that usesapply()to summarize a matrix of input into the desired vector.