I have a huge data frame. One column is an integer ranging from 1

Question

0

Asked: June 13, 20262026-06-13T12:38:02+00:00 2026-06-13T12:38:02+00:00

I have a huge data frame. One column is an integer ranging from 1

0

I have a huge data frame. One column is an integer ranging from 1 to 2.
What I need is a way to look for continous rows with a number of certain values in this column, subset these rows and process them later into graphs.

I attached a small example, which does at least some of the desired work:
I am able to print out the subsets I am looking for. But two questions remain:

I guess there are way smarter methods in R then to apply a “for” loop over the complete data.frame. Any hints?
Which command do I have to put in where now the “print” command is to store the temporary data.frame? I guess I need a list due to the differing length of the subsets…

I already had a look at aggregate or ddply, but could not come up with a solution.

Any help is highly appreciated.

test<-c(rep(1,3),rep(2,5),rep(1,3),rep(2,3),rep(1,3),rep(2,8),rep(1,3)) 
letters<-c("a","b","c","d")
a1<-as.data.frame(cbind(test,letters))

BZ<-2   #The variable to look for
n_BZ=4  #The number of minimum appearences

k<-1  # A variable to be used as a list item index in which the subset will be stored

for (i in 2:nrow(a1)){
  if (a1$test[i-1]!=BZ & a1$test[i]==BZ)      # When "test" BECOMES "2"
    {t_temp<-a1[i,]}                            #... start writing a temporary array
  else if (a1$test[i-1]==BZ & a1$test[i]==BZ) # When "test" REMAINS "2"
    {t_temp<-rbind(t_temp,a1[i,])}              #... continue writing a temporary array 
  else if (a1$test[i-1]==BZ & a1$test[i]!=BZ) # When "test" ENDS BEING "2"
    {if (nrow(t_temp)>n_BZ)                     #... check if the temporary array has more rows then demanded
      {print(t_temp)                              #... print the array (desired: put the array to a list item k)
       k<-k+1}}                                   #... increase k
    else                                      # If array too small
    {t_temp<-NULL}                              # reset
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T12:38:03+00:00

The rle function is really handy for stuff like this. It takes an atomic vector and returns a list with elements lengths and values, where lengths contains the run length of each value in values.

Since the call to cbind in your example coerces the test column to factor, I first converted it to numeric:

a1 <- within(a1, test <- as.numeric(as.character(test)))

Then the result can be obtained in a nice (essentially) one-liner:

with(rle(a1$test),
    split(a1, rep(seq_along(lengths), lengths))[values == BZ & lengths >= n_BZ]
)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a huge data frame. One column is an integer ranging from 1

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply