Is there a fast and clever way that would , lets say from DF like this
vec <- data.frame(Names = c("var1","var2","var3","var4","var5","var6","var7",
"var8","var9","var10","var11","var12","var13",
"var14") ,
phase1= runif(14),
phase1.away= runif(14),
phase1_in= runif(14),
phase1_out= runif(14),
phase1.1= runif(14),
phase1.away.1= runif(14),
phase1_in.1= runif(14),
phase1_out.1= runif(14),
phase1.2= runif(14),
phase1.away.2= runif(14),
phase1_in.2= runif(14),
phase1_out.2= runif(14))
give a new DF as this:
-allways order according phase1.x , give the names of variables corresponding to the values, phase1_in and phase1_out values but not phase1.away.
What I am doing is simply
vec.o<-vec[with(vec, order(-phase1)),]
d1<-vec.o[c("Names","phase1","phase1_in","phase1_out")]
vec.o<-vec[with(vec, order(-phase1.1)),]
d2<-vec.o[c("Names","phase1.1","phase1_in.1","phase1_out.1")]
cbind(d1,d2)
which is extremely boring and I am also sure anti R-ish. Any clever ideas? I am dealing with large data frames permanently and R seems to be
a bit cumbersome. Is there any good literature one would reccomend for these purposes?
(load many variables, create names to them, operations with those variables etc…, )
EDIT : corrected for the case phase.x goes to 10 and higher.
I presume you have quite a lot more than phase1.1, phase1.2, so a general solution using regular expressions would be something along the lines of :
It is based on recognition of the last number preceded by a dot, and cuts that out. If there is no last number preceded by a dot, it’s either the Names variable (which I indicate with -1), or the first phase (which I indicate with 0).
Now you have an id vector that can easily select the variables that belong together, so you can loop over the unique values of id, except the first (being -1). Using regular expressions again you can get whatever variable you want for the construction of a new dataframe. The
do.callon the end combines all those dataframes again.Btw, Ordering sub-dataframes goes quite a lot faster than ordering the original dataframe first and then selecting your variables. This is the gain you have in the solution of nullglob.