I would like to perform many modifications on the columns of data frame. However, having a large number of columns and transformations required, I would like to avoid having to use the data frame name over and over.
In SAS data step, where within one data step, you can create a variable and refer to it right after defining it:
data A;
set A;
varA = varB > 1;
varC = var A + varB;
....
run;
Is it possible to do this in R?
One way I can think of is to use attach(), then create hundreds of arrays then cbind() them before detach(). I know many R veterans suggest not to use attach(). But I need to do heavy data manipulation (hundreds of new variables), and calling transform(df,) on everyone of them sequentially would be quite cumbersome.
For example:
attach(A)
varA <- varB > 1
varC <- varA + varB
A <- cbind(varA, varB, varC)
detach()
But I am not sure if it is the best way to do this in R.
You can use
plyrandmutate.Or
withininbaseR. Notice thatwithinreturns the columns you create in reverse order.By far and away my favourite is
data.tableand:=currently
:=is most easily used only once per call to[. There are ways around this, but I think the string of[calls is not too hard to follow (and it will be MUCH MUCH faster thanmutateor any approach that uses data.frames.)