I have a huge data frame. One column is an integer ranging from 1 to 2.
What I need is a way to look for continous rows with a number of certain values in this column, subset these rows and process them later into graphs.
I attached a small example, which does at least some of the desired work:
I am able to print out the subsets I am looking for. But two questions remain:
- I guess there are way smarter methods in R then to apply a “for” loop over the complete data.frame. Any hints?
- Which command do I have to put in where now the “print” command is to store the temporary data.frame? I guess I need a list due to the differing length of the subsets…
I already had a look at aggregate or ddply, but could not come up with a solution.
Any help is highly appreciated.
test<-c(rep(1,3),rep(2,5),rep(1,3),rep(2,3),rep(1,3),rep(2,8),rep(1,3))
letters<-c("a","b","c","d")
a1<-as.data.frame(cbind(test,letters))
BZ<-2 #The variable to look for
n_BZ=4 #The number of minimum appearences
k<-1 # A variable to be used as a list item index in which the subset will be stored
for (i in 2:nrow(a1)){
if (a1$test[i-1]!=BZ & a1$test[i]==BZ) # When "test" BECOMES "2"
{t_temp<-a1[i,]} #... start writing a temporary array
else if (a1$test[i-1]==BZ & a1$test[i]==BZ) # When "test" REMAINS "2"
{t_temp<-rbind(t_temp,a1[i,])} #... continue writing a temporary array
else if (a1$test[i-1]==BZ & a1$test[i]!=BZ) # When "test" ENDS BEING "2"
{if (nrow(t_temp)>n_BZ) #... check if the temporary array has more rows then demanded
{print(t_temp) #... print the array (desired: put the array to a list item k)
k<-k+1}} #... increase k
else # If array too small
{t_temp<-NULL} # reset
}
The
rlefunction is really handy for stuff like this. It takes an atomic vector and returns alistwith elementslengthsandvalues, wherelengthscontains the run length of each value invalues.Since the call to
cbindin your example coerces thetestcolumn tofactor, I first converted it tonumeric:Then the result can be obtained in a nice (essentially) one-liner: