Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8450683
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T11:02:13+00:00 2026-06-10T11:02:13+00:00

I have a big time series full in one dataframe and a list of

  • 0

I have a big time series full in one dataframe and a list of timestamps in a different dataframe test. I need to subset full with data points surrounding the timestamps in test. My first instinct (as an R noob) was to write the below, which was wrong

subs <- subset(full,(full$dt>test$dt-i) & (full$dt<test$dt+i))

Looking at the result I realized that R is looping through both the vectors simultaneously giving the wrong result. My option is to write a loop like the below:

subs<-data.frame()
for (j in test$dt) 
  subs <- rbind(subs,subset(full,full$dt>(j-i) & full$dt<(j+i)))

I feel that there might be a better way to do loops and this article implores us to avoid R loops as much as possible. The other reason is I might be hitting up against performance issues as this would be at the heart of an optimization algorithm. Any suggestions from gurus would be greatly appreciated.

EDIT:

Here is some reproducible code that shows the wrong approach as well as the approach that works but could be better.

#create a times series
full <- data.frame(seq(1:200),rnorm(200,0,1))
colnames(full)<-c("dt","val")

#my smaller array of points of interest
test <- data.frame(seq(5,200,by=23))
colnames(test)<-c("dt")

# my range around the points of interset
i<-3 

#the wrong approach
subs <- subset(full,(full$dt>test$dt-i) & (full$dt<test$dt+i))

#this works, but not sure this is the best way to go about it
subs<-data.frame()
for (j in test$dt) 
  subs <- rbind(subs,subset(full,full$dt>(j-i) & full$dt<(j+i)))

EDIT:
I updated the values to better reflect my usecase, and I see @mrdwab ‘s solution pulling ahead unexpectedly and by a wide margin.

I am using benchmark code from @mrdwab and the initialization is as follows:

set.seed(1)

full <- data.frame(
  dt  = 1:15000000,
  val = floor(rnorm(15000000,0,1))
)


test <- data.frame(dt = floor(runif(24,1,15000000)))

i <- 500

The benchmarks are:

       test replications elapsed relative
2    mrdwab            2    1.31  1.00000
3 spacedman            2   69.06 52.71756
1    andrie            2   93.68 71.51145
4  original            2  114.24 87.20611

Totally unexpected. Mind = blown. Can someone please shed some light in this dark corner and enlighten as to what is happening.

Important: As @mrdwab notes below, his solution works only if the vectors are integers. If not, @spacedman has the right solution

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T11:02:14+00:00Added an answer on June 10, 2026 at 11:02 am

    I don’t know if it’s any more efficient, but I would think you could also do something like this to get what you want:

    subs <- apply(test, 1, function(x) c((x-2):(x+2)))
    full[which(full$dt %in% subs), ]
    

    I had to adjust your “3” to “2” since x would be included both ways.

    Benchmarking (just for fun)

    @Spacedman leads the way!

    First, the required data and functions.

    ## Data
    set.seed(1)
    
    full <- data.frame(
      dt  = 1:200,
      val = rnorm(200,0,1)
    )
    
    test <- data.frame(dt = seq(5,200,by=23))
    
    i <- 3 
    
    ## Spacedman's functions
    cf = function(l,u){force(l);force(u);function(x){x>l & x<u}}
    OR = function(f1,f2){force(f1);force(f2);function(x){f1(x)|f2(x)}}
    funs = mapply(cf,test$dt-i,test$dt+i)
    anyF = Reduce(OR,funs)
    

    Second, the benchmarking.

    ## Benchmarking
    require(rbenchmark)
    benchmark(andrie = do.call(rbind, 
                               lapply(test$dt, 
                                      function(j) full[full$dt > (j-i) & 
                                        full$dt < (j+i), ])),
              mrdwab = {subs <- apply(test, 1, 
                                      function(x) c((x-(i-1)):(x+(i-1))))
                        full[which(full$dt %in% subs), ]},
              spacedman = full[anyF(full$dt),],
              original = {subs <- data.frame()
                          for (j in test$dt) 
                            subs <- rbind(subs, 
                                          subset(full, full$dt > (j-i) & 
                                            full$dt < (j+i)))},
              columns = c("test", "replications", "elapsed", "relative"),
              order = "relative")
    #        test replications elapsed  relative
    # 3 spacedman          100   0.064  1.000000
    # 2    mrdwab          100   0.105  1.640625
    # 1    andrie          100   0.520  8.125000
    # 4  original          100   1.080 16.875000
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have big number, time (micro seconds) stored in two 32bit variables. I need
I have a big question that has been puzzling me for a long time
I have a very big constant array that is initialized at compile time. typedef
I'm studying the running time of programs and have come across the Big O
I have big URL list, which I have to download in parallel and check
My google skills are failing me big time. If I have a standary Ruby
I have big collection of tweets stored in MongoDB. Tweets look like this one:
I need a C++ library that can store and retrieve time series on demand
For one of my projects, I have to enter a big-ish collection of events
I need to apply the Mann Kendall trend test in R to a big

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.