Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8749075
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T12:38:02+00:00 2026-06-13T12:38:02+00:00

I have a huge data frame. One column is an integer ranging from 1

  • 0

I have a huge data frame. One column is an integer ranging from 1 to 2.
What I need is a way to look for continous rows with a number of certain values in this column, subset these rows and process them later into graphs.

I attached a small example, which does at least some of the desired work:
I am able to print out the subsets I am looking for. But two questions remain:

  • I guess there are way smarter methods in R then to apply a “for” loop over the complete data.frame. Any hints?
  • Which command do I have to put in where now the “print” command is to store the temporary data.frame? I guess I need a list due to the differing length of the subsets…

I already had a look at aggregate or ddply, but could not come up with a solution.

Any help is highly appreciated.

test<-c(rep(1,3),rep(2,5),rep(1,3),rep(2,3),rep(1,3),rep(2,8),rep(1,3)) 
letters<-c("a","b","c","d")
a1<-as.data.frame(cbind(test,letters))

BZ<-2   #The variable to look for
n_BZ=4  #The number of minimum appearences

k<-1  # A variable to be used as a list item index in which the subset will be stored

for (i in 2:nrow(a1)){
  if (a1$test[i-1]!=BZ & a1$test[i]==BZ)      # When "test" BECOMES "2"
    {t_temp<-a1[i,]}                            #... start writing a temporary array
  else if (a1$test[i-1]==BZ & a1$test[i]==BZ) # When "test" REMAINS "2"
    {t_temp<-rbind(t_temp,a1[i,])}              #... continue writing a temporary array 
  else if (a1$test[i-1]==BZ & a1$test[i]!=BZ) # When "test" ENDS BEING "2"
    {if (nrow(t_temp)>n_BZ)                     #... check if the temporary array has more rows then demanded
      {print(t_temp)                              #... print the array (desired: put the array to a list item k)
       k<-k+1}}                                   #... increase k
    else                                      # If array too small
    {t_temp<-NULL}                              # reset
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T12:38:03+00:00Added an answer on June 13, 2026 at 12:38 pm

    The rle function is really handy for stuff like this. It takes an atomic vector and returns a list with elements lengths and values, where lengths contains the run length of each value in values.

    Since the call to cbind in your example coerces the test column to factor, I first converted it to numeric:

    a1 <- within(a1, test <- as.numeric(as.character(test)))
    

    Then the result can be obtained in a nice (essentially) one-liner:

    with(rle(a1$test),
        split(a1, rep(seq_along(lengths), lengths))[values == BZ & lengths >= n_BZ]
    )
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have huge amount of data from database, i need to get or store
I have a Huge data file and I only need specific data from this
I have huge data which is static. I need to save it within the
I have a huge number of logfiles stored in HDFS which look like the
I have this huge data frame that has servernames, Date, CPU, memory as the
Here is the situation: I have a huge data set that I need quick
I have a huge data, with 7 colums and 20000 rows. I let Matlab
I have done some research for The bast way to insert huge data into
I have a huge data set and I want to extract the rows which
I have a huge data set with genotypic information from different populations. I would

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.